Quick answer: Test replays by proving that playback reproduces the original run exactly. For input-based replays, that means hunting every source of nondeterminism: floating-point variance, uninitialized random seeds, frame-rate-dependent logic, and physics that drift. Verify recording captures everything needed, that playback matches the live run frame for frame, and that version changes do not silently corrupt old replays.

A replay system makes a beautiful promise: that you can watch exactly what happened, again. Keeping that promise is one of the harder problems in game development, because most replays store inputs and re-simulate, and re-simulation only matches the original if your game is perfectly deterministic. The smallest divergence, a float that rounds differently, a random seed that was not captured, a system that depends on frame rate, compounds until the replay diverges visibly from reality. QA for replays is largely the discipline of hunting nondeterminism. This post covers testing recording fidelity, determinism, and playback so your replays show what actually occurred.

Decide what a replay must reproduce

Before testing, be clear on your replay model, because it determines what you verify. Input-based replays store the player inputs and a starting state and re-run the simulation, which is compact but demands strict determinism. State-based replays store snapshots of game state over time, which is robust against nondeterminism but large and limited in what it can do, like changing camera angles. Many games blend the two. Your tests must match the model you actually chose.

For an input-based replay, the test of truth is that re-simulating the recorded inputs from the recorded starting state yields a run identical to the original, ideally checkable by hashing game state at intervals. For a state-based replay, the test is that the stored snapshots are complete and interpolate correctly. Knowing which guarantees your system claims tells you exactly which bugs to hunt, and stops you from testing for fidelity your architecture was never designed to provide.

Hunt every source of nondeterminism

If your replays re-simulate, determinism is everything, and the bugs hide in well-known places. Test for uninitialized or uncaptured random seeds, since a single unseeded random call makes the replay diverge. Test floating-point determinism across builds and platforms, because the same physics can produce subtly different results with different compiler flags or hardware, which is fatal for cross-platform replays. Test any logic that depends on frame rate or wall-clock time rather than the simulation step.

The reliable way to find divergence is to record a run, immediately play it back while hashing simulation state each tick, and compare against the live hashes. The first tick where they differ points straight at the nondeterministic system. Test long runs, not short ones, because tiny divergences compound over time and a replay that matches for ten seconds can be wildly wrong after ten minutes. Make this record-replay-compare loop a standing test, because determinism regresses the instant someone adds an unseeded random or a frame-dependent calculation.

Verify recording fidelity and completeness

A replay can only reproduce what it recorded, so test that recording captures the complete set of inputs and initial state your model needs, with nothing dropped. Test under load: dropped frames, input bursts, and pauses must all be recorded faithfully, because a replay that loses inputs during a laggy moment will desync exactly when the action was most interesting. Confirm timing is captured precisely enough that inputs replay at the right simulation tick.

Test the boundaries of a recording: the very start, the very end, and any mid-session events like a pause, a menu, or a network hiccup. Confirm the recording handles a session that ends abruptly, like a crash, by leaving a usable partial replay rather than a corrupt file. Test file size and performance too, since recording must not degrade the live game. A replay system that hurts frame rate while recording will be turned off, and one that produces corrupt files when the game crashes fails exactly when a replay would be most useful for debugging.

Test playback accuracy and controls

Playback is where players judge the system, so test that it visibly matches the original run, not just that the underlying state hashes agree. Watch a recorded run and its replay side by side and confirm positions, animations, effects, and outcomes align. Then test the playback controls that make replays valuable: pause, rewind, fast-forward, slow motion, and seeking to an arbitrary point. Seeking is the hardest, because it usually requires re-simulating from the nearest snapshot to the target time.

Test free camera and alternate views if you offer them, since these expose state that the original run never rendered and can reveal gaps in what was recorded. Confirm that scrubbing backward and forward lands on consistent state rather than drifting, and that fast-forward does not skip simulation steps in a way that changes the outcome. Playback that looks right at normal speed but desyncs when you scrub is a common and frustrating bug, so exercise the controls hard, not just linear playback.

Setting it up with Bugnet

Replay desyncs are among the hardest bugs to reproduce, because they depend on the exact run, build, and platform, and a player describing a replay that looks wrong cannot convey which tick diverged. Bugnet's in-game report button captures game state and platform context automatically, so a desync report arrives with the build, the platform, and the session details, which is often the thread you need to find that the divergence only happens on a particular hardware floating-point path. Crashes during recording are captured with full stack traces and context, so a corrupt-replay crash is debuggable.

Replay bugs cluster by build version and platform, exactly the dimensions that determinism depends on. Bugnet's occurrence grouping folds duplicate reports into one issue with a count, so a spike in desync reports right after a release immediately flags a determinism regression. Add custom fields for the build version and the replay type, then filter the dashboard to see whether desyncs concentrate in cross-platform playback or in old replays opened on a new build. One dashboard, segmented by version, turns scattered the replay looks wrong reports into a precise signal about where determinism broke.

Protect replays across versions

The cruelest replay bug is the one that appears later: an old replay that played perfectly on the version it was recorded on diverges after a patch changes the simulation. Decide your policy, whether you guarantee old replays keep working or explicitly version and gate them, and test it. If you claim backward compatibility, test playing replays from prior builds after every change to simulation, physics, or content, because any of these can silently break determinism for archived runs.

Tag every replay with the build version that produced it, and test that opening a replay from an incompatible version fails gracefully with a clear message rather than playing a quietly wrong run, which is far more damaging than an honest refusal. Make a cross-version replay suite part of release: keep a library of reference replays and confirm they still reproduce correctly, or are correctly rejected. The studios whose replays players actually trust are the ones who treat determinism as a contract and keep testing that the contract holds across every build.

A replay is a promise to show what happened. Hunt nondeterminism, hash to find divergence, and treat cross-version determinism as a contract.