What is an error budget for game stability?

An error budget is the maximum amount of instability your game is allowed over a rolling time window. If you set a 99.5% crash-free session target over 28 days, your error budget is 0.5% of sessions. When crashes consume that budget, feature development stops and the team focuses entirely on stability.

What crash-free rate should a game aim for?

Most production games target a crash-free session rate between 99.3% and 99.9%. A good starting point is 99.5%, meaning no more than 5 out of every 1,000 sessions end in a crash. Adjust based on platform — mobile games typically need higher rates because app store algorithms penalize instability.

What happens when the error budget is exhausted?

When the error budget hits zero, the team enters a stability sprint. All feature development and non-critical work stops. Engineers focus exclusively on fixing crashes, reducing error rates, and improving monitoring until the crash-free rate recovers above the target threshold.

How to Set Up Error Budgets for Game Stability

Quick answer: Define a crash-free session rate target (e.g., 99.5% over 28 days). Track how many crash sessions you have consumed against that budget daily. When the budget is exhausted, freeze feature work and run a stability sprint until the crash-free rate recovers. Error budgets turn stability from a vague aspiration into a measurable commitment.

Every game studio says stability matters. Few have a system that forces them to act on it. Feature requests always feel urgent. Crash fixes always feel like they can wait until next sprint. The result is a slow accumulation of instability that players notice before the team does. Error budgets, borrowed from the Site Reliability Engineering discipline at large tech companies, solve this problem by making stability a finite, measurable resource. When the budget runs out, the conversation changes from “should we fix crashes or ship features?” to “we have no choice but to fix crashes.” That clarity is the entire point.

Defining Your Crash-Free Session Rate

An error budget starts with a target. For games, the most useful metric is crash-free session rate: the percentage of player sessions that complete without a crash. A session begins when the player launches the game and ends when they quit normally. A crash ends the session abnormally. The crash-free rate is the ratio of clean sessions to total sessions.

Most production games target a crash-free rate between 99.3% and 99.9%. A reasonable starting point for an indie studio is 99.5%, which means no more than 5 out of every 1,000 sessions should end in a crash. If you have 10,000 sessions per day, your budget allows 50 crash sessions per day. Over a 28-day window, that is 1,400 crash sessions total.

The target you choose depends on your platform and audience. Mobile games need higher crash-free rates because the App Store and Google Play algorithms penalize apps with high crash rates through reduced visibility. PC games have more tolerance because players expect to troubleshoot hardware and driver issues. Multiplayer games need higher rates than single-player games because a crash in a multiplayer match affects multiple players, not just one.

Calculating and Tracking Burn Rate

Once you have a target, track how fast you are consuming your error budget. The burn rate is the ratio of actual crashes to the budget allowance over a given period. A burn rate of 1.0 means you are consuming budget exactly at the expected pace. A burn rate above 1.0 means you are on track to exhaust the budget before the window ends.

# Error budget calculation
var target_crash_free_rate = 0.995   # 99.5%
var window_days = 28
var daily_sessions = 10000

# Total budget for the window
var allowed_crash_rate = 1.0 - target_crash_free_rate  # 0.005
var total_budget = daily_sessions * window_days * allowed_crash_rate
# total_budget = 10000 * 28 * 0.005 = 1400 crash sessions

# Daily budget
var daily_budget = total_budget / window_days  # 50 crash sessions/day

# Burn rate (if today had 75 crashes)
var todays_crashes = 75
var burn_rate = todays_crashes / daily_budget  # 1.5x — burning too fast

Display the remaining budget prominently on your team dashboard. A simple progress bar that goes from green (plenty of budget) to yellow (budget running low) to red (budget exhausted) is more effective than a raw number. The visual creates urgency that a spreadsheet does not.

Set alerts at two thresholds: when the burn rate exceeds 1.5x for more than 24 hours (warning — investigate), and when remaining budget drops below 25% of the window allowance (critical — consider pausing risky deployments). These alerts give the team time to react before the budget is fully consumed.

What Happens When the Budget Is Exhausted

This is where error budgets earn their value. When the budget hits zero, the team enters a stability sprint. All feature development stops. All non-critical work is paused. Every engineer works exclusively on reducing the crash rate until the crash-free session rate recovers above the target.

The stability sprint is not a punishment. It is a pre-agreed policy that the team establishes when they adopt the error budget system. Document the policy before the first budget is set. Get buy-in from the product owner, the lead engineer, and anyone who controls the development roadmap. When the budget exhausts, there is no negotiation about whether to pause features — the policy activates automatically.

During a stability sprint, the team focuses on three activities: fixing the top crash-causing bugs by frequency, adding defensive code around crash-prone systems, and improving crash reporting so that future crashes are easier to diagnose. The sprint ends when the crash-free rate has been above the target for at least 48 consecutive hours and the team has confidence that the improvement is sustainable, not just a temporary fluctuation.

Connecting Error Budgets to Release Decisions

Error budgets also inform release decisions. If the budget is healthy, the team has earned the right to ship risky features — new systems, experimental mechanics, or major refactors. If the budget is low, only low-risk changes should be deployed. If the budget is exhausted, nothing ships except crash fixes.

This creates a natural feedback loop. A team that ships stable code builds up budget surplus, which allows them to take bigger risks in future sprints. A team that ships unstable code burns through the budget, which forces them to slow down and focus on quality. Over time, the incentive structure pushes the team toward a sustainable pace of feature development that does not sacrifice stability.

// Release gating based on error budget health
{
  "budget_remaining_percent": 62,
  "burn_rate_24h": 0.8,
  "release_gate": "green",
  "allowed_risk_level": "high",
  "policy": "All release types permitted"
}

{
  "budget_remaining_percent": 18,
  "burn_rate_24h": 1.4,
  "release_gate": "yellow",
  "allowed_risk_level": "low",
  "policy": "Bug fixes and minor changes only"
}

{
  "budget_remaining_percent": 0,
  "burn_rate_24h": 2.1,
  "release_gate": "red",
  "allowed_risk_level": "none",
  "policy": "Stability sprint — crash fixes only"
}

Running a Stability Sprint Effectively

A stability sprint without structure devolves into whack-a-mole. Prioritize ruthlessly. Sort all open crash reports by frequency — how many sessions are affected — not by severity or individual player complaints. A crash that affects 200 sessions per day is more important than a crash that one player reported as “critical” but only occurs under obscure conditions.

Assign engineers to the top three crash signatures. Fixing the most frequent crash first provides the largest immediate improvement to the crash-free rate. After the top crashes are fixed, re-sort and tackle the next tier. This greedy approach maximizes budget recovery per engineering hour.

Conduct a brief retrospective at the end of every stability sprint. Ask two questions: why did the budget exhaust, and what process change would have prevented it? Common answers include insufficient automated testing, deploying too many changes in a single patch, and ignoring early warning signs from crash telemetry. Convert the answers into action items and track them. If the same root cause triggers two stability sprints, the process is broken and needs a deeper fix.

“We adopted error budgets after our third consecutive patch shipped with a new crash. The first time the budget exhausted, it felt painful to stop feature work. But two days into the stability sprint, we fixed three crashes that had been in the backlog for months. Our crash-free rate went from 98.9% to 99.7%. Now the team defends the error budget like it is their own money, because they remember what the sprint felt like.”

Related Issues

For guidance on tracking the crash data that feeds into error budget calculations, see automated crash reporting for indie games. To learn about the metrics that complement error budgets, read bug reporting metrics every game studio should track. For best practices on patch rollback when a release burns through the budget, check out best practices for game patch rollback.

Pick a crash-free rate target this week, even if it is a guess. Measure your current rate for seven days. The gap between your target and your reality will tell you exactly how much stability work your game needs.