Quick answer: Define your rollback trigger threshold before each release (crash rate up X% within Y hours), decide rollback vs hotfix vs acknowledge-and-patch using a decision matrix, keep the previous stable build on a Steam beta branch at all times, and run a post-mortem after every rollback using crash data to calibrate your thresholds for next time.
The worst time to decide whether to roll back a patch is while it’s happening. Discord is erupting, the crash rate is climbing, and the team is arguing about whether the spike is real or just a monitoring artifact. Every minute you spend debating the threshold is a minute more players hit the broken build. The decision to roll back — or not — should be made before the patch ships, written down, and rehearsed. Here’s how to build that process.
Defining Your Rollback Trigger
A rollback trigger is a specific, measurable condition that automatically initiates the rollback evaluation process. Not the rollback itself — a human still makes the final call — but the condition that starts the clock on a decision.
Write your trigger thresholds as part of your release checklist, before the patch goes live:
- Crash rate threshold: crash rate (crashes per session, or crashes per hour) increases by more than X% compared to the pre-patch baseline, measured over the first Y hours post-launch. A common starting point is 50% increase over 2 hours.
- Progression blocker threshold: a specific bug is reported by more than Z% of active sessions. Even a low crash rate justifies rollback consideration if the bug prevents players from progressing past a mandatory area.
- Single-crash-signature threshold: one crash signature appears in more than N% of all sessions. A concentrated crash is more likely to have a single fix than a diffuse increase in crash diversity.
-- Example: query to detect crash rate spike in Bugnet data
-- Compare crash rate in first 2 hours after patch vs prior 48h baseline
WITH post_patch AS (
SELECT COUNT(*) AS crashes
FROM crash_events
WHERE build_version = '1.4.2'
AND created_at >= '2026-03-01 10:00:00'
AND created_at < '2026-03-01 12:00:00'
),
baseline AS (
SELECT COUNT(*) / 24.0 AS crashes_per_hour
FROM crash_events
WHERE build_version = '1.4.1'
AND created_at >= '2026-02-27 10:00:00'
)
SELECT
post_patch.crashes / 2.0 AS post_patch_hourly,
baseline.crashes_per_hour AS baseline_hourly,
(post_patch.crashes / 2.0) / NULLIF(baseline.crashes_per_hour, 0) AS multiplier
FROM post_patch, baseline;
Set up this query as an alert in your crash reporter. Bugnet’s alert rules let you fire a webhook or email when a crash rate threshold is crossed, which can page your team automatically rather than requiring someone to watch a dashboard.
The Decision Matrix: Rollback vs Hotfix vs Acknowledge
When a trigger fires, you have three options. Which one is right depends on the nature of the bug, not just the crash rate:
- Rollback: revert the entire build to the previous version. Use when: the crash affects a large percentage of sessions, you cannot identify the cause quickly, and the rollback is technically feasible (see below). Rollback is the nuclear option — it also reverts everything good in the patch.
- Hotfix: ship a targeted fix within hours. Use when: you can identify and fix the specific crash quickly, the crash rate is bad but not catastrophic, and shipping a hotfix is faster than the rollback process. A hotfix requires the whole build and test pipeline to run clean, which takes time.
- Acknowledge and patch: communicate the issue publicly, provide a workaround if possible, and ship a fix in the next scheduled patch. Use when: the crash affects a small percentage of sessions, a workaround exists, and neither rollback nor hotfix is feasible quickly. This option accepts short-term player frustration to avoid the risks of a rushed response.
Make the decision as a team, not solo. One person’s judgment under stress is more error-prone than a quick 3-person call using a pre-agreed framework.
Why Rollbacks Are Rarely the Right Answer on Steam
Steam’s update model creates a fundamental problem for rollbacks: Steamworks does not provide a way to force players who have already updated to revert to an older build. A player who has auto-updated to v1.4.2 is on v1.4.2 until they manually switch branches, verify file integrity after you’ve reset the default depot, or reinstall the game.
What you can do on Steam:
- Set the previous build’s depot as the default branch. New downloads and players who reinstall will get the old build.
- Create a stable beta branch pointing to the previous depot and tell players to opt in. This requires player action but gives affected players a clear path.
- Add a Steam announcement explaining the issue and linking to the branch opt-in instructions.
Players who have already updated and are actively crashing cannot be automatically reverted. For them, the only options are a hotfix or a workaround communicated via the game’s main menu or Steam announcement.
Branch Strategy for Keeping the Previous Build Deployable
The prerequisite for any rollback strategy is having the previous build accessible and deployable at all times. In Steamworks, manage this with a branch per major release:
# Steamworks branch structure (managed via steamcmd)
default → current live build (v1.4.2 depot)
stable → previous stable build (v1.4.1 depot)
beta → upcoming build (v1.5.0-beta depot)
legacy → older maintained build for compatibility (v1.3.x depot)
Before every release, update the stable branch to point to the current live build. Then update the default branch to the new build. If a rollback is needed, swapping default back to the stable depot is a single Steamworks operation that takes under a minute.
Keep the depot files for at least the last three major releases in Steamworks. Depot storage is not free but losing the ability to roll back is more expensive.
Server-Side vs Client-Side Rollback
Online games have two independent rollback surfaces. Understanding which one you’re rolling back determines the speed and scope of the operation:
Client-side rollback reverts the game binary that players download and run. This is the Steam depot scenario described above. It’s slow to propagate (players must update or be pointed to a different branch), can’t reach players who’ve already updated, and must account for save compatibility between the new and old client versions.
Server-side rollback reverts your backend — matchmaking servers, game servers, cloud save APIs, leaderboard services. Because you control the infrastructure, a server-side rollback is immediate and affects all players simultaneously regardless of which client version they’re running. If a patch introduced a server-side regression (a broken API response, a matchmaking algorithm that creates unbalanced lobbies, a leaderboard calculation error), rolling back the server often resolves the player-facing issue while you prepare a proper client patch.
For online games, implement separate deployment pipelines for client and server, with independent rollback procedures for each. A server regression should not require a client rollback, and vice versa.
Communicating a Rollback to Players
Players are generally forgiving of bugs if you communicate honestly and quickly. They are not forgiving of silence. When a rollback is in progress:
“We identified a crash introduced in today’s 1.4.2 update affecting players on [platform/condition]. We’ve reverted the default branch to 1.4.1 while we investigate. If you’re on 1.4.2 and experiencing crashes, [workaround or branch opt-in instructions]. We’ll ship a fix as 1.4.3 once we’ve identified the root cause — target is within 48 hours.”
Post this on Steam Announcements, Discord, and Twitter simultaneously. Don’t wait until the rollback is complete — post as soon as you’ve made the decision. The acknowledgment that you’re aware of the issue reduces the volume of new bug reports and Discord pings, which in turn lets your team focus on fixing rather than responding.
The Post-Rollback Investigation and Post-Mortem
Every rollback generates a post-mortem. The primary goal of the post-mortem is not assigning blame but calibrating your future rollback thresholds using real data from the incident. Pull the crash data from Bugnet for the incident window and document:
- What did the crash rate look like at T+30min, T+1h, T+2h after the patch?
- At what point did the on-call developer first notice the spike?
- How long from first notice to rollback decision?
- How long from decision to rollback complete?
- What was the total number of affected sessions?
- Was the rollback threshold correctly calibrated, or did you roll back too early/late?
Write these numbers into the post-mortem document and into your release checklist for the next patch. If your threshold was set at “50% crash rate increase in 2 hours” but the crash rate had already increased 200% before anyone noticed, your alerting needs work. If you rolled back at 20% increase and the investigation revealed the spike was noise from a different source, your threshold is too sensitive.
The post-mortem is also where you document the fix — specifically, what code change caused the regression and why it wasn’t caught by testing. If you can add a regression test that would have caught the bug, do it before closing the post-mortem. The next patch’s release process should include this test as a requirement.
A rollback is not a failure. Failing to rollback when you should have — that’s the failure. Know your thresholds before you ship.