Why does a lockstep game crash long after the real bug?

Because lockstep exchanges only commands and trusts every client to compute identical state. A single nondeterministic line makes the clients diverge silently, with no authoritative state to correct it. The crash comes turns later as a downstream consequence, like an out-of-range index, so without a checksum you only ever see the symptom, not the divergence.

How does a per-turn checksum help?

Every client hashes its simulation state each turn and compares hashes. The first turn the hashes disagree is the exact turn determinism broke, which is far more useful than the later crash. Capturing the checksum history on a report brackets the bug to one turn and turns a vague late crash into a precise question about that turn.

What causes desyncs in lockstep games?

Determinism violations: floating-point differences across platforms, iteration over unordered containers, uninitialized memory, wall-clock time in the step, or an unseeded random stream. Late-arriving commands applied on the wrong turn can also desync everyone. Attaching platform, seed, and the command queue to reports tells these apart and points you at the rule that broke.

Crash Reporting for Lockstep Multiplayer Games

Quick answer: Lockstep multiplayer runs the same deterministic simulation on every client by exchanging only inputs, so a single nondeterministic line desyncs the clients and the resulting crash can land many turns after the real divergence. The fix is to capture the turn number, the command queue, and a per-turn state checksum, so you can pinpoint the exact turn where clients diverged instead of the much later turn where one of them finally crashed.

Lockstep is how strategy games and many peer-to-peer titles keep hundreds of units in sync without shipping their positions over the wire: every client runs the same deterministic simulation and exchanges only player commands, applied on the same turn after a fixed input delay. It is bandwidth-efficient and elegant, and it is merciless about determinism. One uninitialized value, one iteration over a hash map, one stray call to the system clock, and the clients silently diverge, then crash turns later when the divergence finally produces an impossible state. This post covers how to capture lockstep crashes with the turn, command, and checksum context that reveals where the simulations actually parted ways.

Same inputs, same state, until they are not

Lockstep's contract is simple: if every client starts from the same state and applies the same commands on the same turns, they stay identical forever, so you only need to send commands. The input delay buffer, those few turns between issuing a command and executing it, exists to give the network time to deliver every player's commands before the turn runs. When the contract holds, no positions or health values ever cross the wire, which is why lockstep scales to huge unit counts.

The contract breaks the instant any client computes something differently. Because nothing but commands is exchanged, there is no authoritative state to correct the drift, so the clients simply diverge and keep simulating their now-different worlds. The crash, when it comes, is usually a downstream consequence, an index out of range, a null unit, a negative resource count, produced many turns after the actual determinism violation. You are always debugging the symptom unless you captured the divergence.

Checksums find the turn that diverged

The essential tool for lockstep is a per-turn state checksum that every client computes and compares. Each turn, hash the simulation state and exchange the hashes; the first turn where they disagree is the turn the determinism broke. That single number is worth more than any stack trace, because it tells you exactly where to look, before the corruption spread. Capturing it on a crash, along with the history of recent checksums, brackets the bug to one turn.

Make the checksum meaningful by hashing the state that matters, unit positions, health, resources, the random seed, and including enough granularity to localize. Some teams hash per-subsystem so a divergence points at, say, the physics step versus the economy step. When a crash report carries the last-matching turn and the first-diverging turn, you have converted a vague late crash into a precise question: what did this client do differently on that one turn.

Capture the command queue and turn

Alongside the checksum, attach the turn number at the crash and the command queue around it: which player issued which command on which turn, including the input delay offset. Lockstep is fully replayable from a starting state plus the complete command stream, so this is the raw material for reproducing the desync deterministically. A crash that only appears when two players queue overlapping commands on the same turn is invisible in single-player and obvious once you have the command log.

Commands are tiny, an action type and a few parameters, so capturing a window of them costs almost nothing. Record the local player's perspective and, where you can, the divergence between what this client expected and what it received. Input delay itself is occasionally the culprit: a command that arrives late and gets applied on the wrong turn will desync everyone, and seeing the turn offset in the report distinguishes a timing bug from a logic bug.

Hunt the nondeterminism, not the crash

Almost every lockstep crash is downstream of a determinism leak, so your reports should help you find the leak. Attach the platform and architecture, since floating-point results and integer overflow behavior can differ across them, and note the random seed and how you advance it. A divergence that only appears between two specific platforms is a cross-platform math bug; one that appears on identical hardware points at unordered iteration or uninitialized memory in your own code.

The fix is rarely at the crash site and almost always a discipline applied across the simulation: no wall-clock time, no unordered containers in the step, a single seeded random stream, fixed-point or carefully controlled floats. Each new desync signature is a clue about which rule got broken. Treat the crash as a pointer to a determinism bug rather than a bug in itself, and you stop patching symptoms and start closing the actual leaks.

Setting it up with Bugnet

Bugnet captures lockstep crashes with automatic stack traces and platform context, and you extend it with the fields this genre needs. Add custom fields for the turn number, the recent command queue, the per-turn checksum history, and the last-matching and first-diverging turns from your checksum comparison. Wire the desync detector to file a report the moment checksums disagree, before any crash. Both the silent desync and the late crash it causes then arrive in one dashboard with the divergence point already identified.

Occurrence grouping pays off because lockstep determinism bugs recur under specific command patterns, so identical divergences fold into one issue with a count instead of scattering as unreproducible reports. Filter by platform pairing to isolate cross-platform math bugs, or by diverging turn to find a particular subsystem's leak. Player attributes show which client diverged first, and the captured command stream means many desyncs replay deterministically offline, turning a late crash into a one-turn investigation.

Replay and regression-test the determinism

Because lockstep is fully replayable from a state and a command stream, every captured desync can become a deterministic offline replay and then a regression test. Save the worst recurring divergences as fixtures, a starting state plus a command log that must simulate identically across all your target platforms, and run them in continuous integration. A determinism bug with a replay is no longer scary; it is a failing test you fix once and guard forever.

Encourage the team to treat desync reports as first-class, because in lockstep a silent divergence is strictly worse than a loud crash, it can hand one player a win they did not earn before anything throws. The studios that ship reliable lockstep strategy games are the ones that made every turn's determinism observable and every divergence replayable, so a violation that would have crashed a player ten turns later instead shows up as a precise, repeatable report on turn zero.

In lockstep, the crash is the echo and the desync is the bug. Capture the turn checksums and a divergence ten turns deep becomes a one-turn fix.