Quick answer: Instrument per-tick timing for every system, reproduce the load locally, and find the system whose cost scales with player count. The slowest system is the bug. Fix it or move it off the per-tick path.

Your server runs at a beautiful 60 Hz with five players. You hit twenty players and the tick rate collapses to 28 Hz. Players start rubber-banding, hits stop registering, and the support inbox fills up. The server is not crashing, the CPU is not pinned, and yet the simulation is falling behind. This is the most common server-side performance bug in multiplayer games, and the root cause is almost always a system whose work scales with the wrong variable.

What “Tick Rate” Actually Means

A game server runs in a loop. Each iteration of the loop is a tick: the server reads incoming inputs, advances the simulation by a fixed amount of game time, sends outgoing snapshots, and sleeps until the next tick is due. A 60 Hz server has a 16.6 ms budget per tick. A 30 Hz server has 33.3 ms.

If the work in a tick takes less than the budget, the server sleeps the rest. If it takes more, the next tick is late, and the server has two choices:

  1. Run multiple ticks back to back to catch up (called a “catch up step”).
  2. Drop ticks and continue at a lower effective rate.

Either response is a bug. Catch-up steps stack work and make the next slowdown worse. Dropped ticks degrade simulation quality, and players see it as lag, missed shots, and rubber-banding.

Step 1: Instrument the Tick Loop

You cannot fix what you cannot measure. Add timing instrumentation that logs:

// Go example: simple tick instrumentation
type TickMetrics struct {
    Start       time.Time
    Input       time.Duration
    Physics     time.Duration
    AI          time.Duration
    Network     time.Duration
    Persistence time.Duration
    Total       time.Duration
}

func (s *Server) RunTick() {
    m := TickMetrics{Start: time.Now()}
    defer func() {
        m.Total = time.Since(m.Start)
        s.metrics.RecordTick(m, len(s.players))
        if m.Total > s.tickBudget {
            slog.Warn("slow tick",
                "total", m.Total, "physics", m.Physics,
                "ai", m.AI, "net", m.Network,
                "players", len(s.players))
        }
    }()
    t := time.Now(); s.processInput();   m.Input = time.Since(t)
    t = time.Now(); s.stepPhysics();    m.Physics = time.Since(t)
    t = time.Now(); s.updateAI();        m.AI = time.Since(t)
    t = time.Now(); s.sendSnapshots();   m.Network = time.Since(t)
}

Send the metrics to your monitoring system (Prometheus, Datadog, or even just log lines that you grep through). Plot tick duration over time and overlay player count. The shape of that graph tells you everything.

Step 2: Reproduce the Load Locally

You cannot debug a server tick rate problem from production logs alone. You need to reproduce the load on a development machine where you can attach a profiler.

Build a load test that connects N fake clients to the server and has them perform realistic actions. The fake clients do not need a renderer, just a network transport and a behavior loop that moves around and fires.

// Conceptual fake client
func FakeClient(serverAddr string, ctx context.Context) {
    conn, _ := DialServer(serverAddr)
    defer conn.Close()
    ticker := time.NewTicker(33 * time.Millisecond)
    for {
        select {
        case <-ctx.Done():
            return
        case <-ticker.C:
            input := RandomInput()
            conn.Send(input)
        }
    }
}

Spawn 30 of these against your server and watch the tick metrics climb. The bug becomes reproducible on demand.

Step 3: Find the Scaling Bottleneck

The system whose duration grows fastest with player count is the bottleneck. Common patterns:

O(N²) physics or AI. Every entity checks against every other entity. With 10 players and 50 NPCs, that is 60 entities checking 60 others = 3600 checks. With 30 players and 50 NPCs, it is 80 × 80 = 6400. The cost more than doubled when player count tripled.

Fix: spatial partitioning. Use a grid, quadtree, or BVH so each entity only checks neighbors. The cost drops to roughly O(N) for sparse worlds.

Network broadcast that scales with player count squared. Each player’s snapshot includes the state of every other player. Construction time grows as N².

Fix: precompute the snapshot once per tick and reuse it for every recipient, applying only per-recipient delta information at send time.

Persistence on every tick. Saving player state to a database in the per-tick path. Fast at low load, fatal at high load.

Fix: move persistence onto a separate goroutine/thread and write at a lower frequency (e.g. every 10 ticks, or only on state changes).

Logging in the hot path. A debug log at INFO level inside the per-tick loop is invisible at low load but burns 5 ms per tick at scale.

Fix: drop the log to DEBUG, or sample (log 1 of every 100 ticks).

Step 4: Reschedule, Don’t Just Optimize

Optimization helps, but rescheduling is more powerful. Most servers do far too much work in the per-tick loop because it is the easiest place to put it.

Move expensive but non-time-critical work onto a separate cadence:

The per-tick path should only do work that must happen on every tick. Everything else lives somewhere else.

Verifying the Fix

Run the load test before and after. Plot tick duration as a function of player count. The fix is working when the slope of that line is shallower — ideally flat. A perfectly scalable server has the same per-tick cost at 5 players and 100 players.

“Server tick rate is the heart rate of your simulation. If it slows down, every other system goes wrong. Measure it, alarm on it, and treat any drop as a P1 incident.”

Related Issues

For server crashes that look like tick rate problems, see how to debug dedicated server crashes. For lag compensation specifically, see how to debug multiplayer lag compensation bugs. For network packet issues, see how to debug network packet loss in online games.

A tick budget alarm should fire when the average tick exceeds 80% of budget, not 100%. The early warning gives you time to fix it before the server actually falls behind.