How do you detect crash regressions between game versions?

Compare the crash-free session rate of the new version against the previous version over the same time window. If the new version's rate is lower by more than 0.5 percentage points, investigate immediately. Also compare the top crash signatures between versions. New crash signatures that appear in the top 10 for the new version are regressions.

Should you track total crash count or crash-free rate?

Track crash-free rate as your primary metric because it normalizes for player count. Total crash count goes up when your game gets more popular, which is misleading. Crash-free rate tells you the actual percentage of sessions that complete without crashing, regardless of how many players you have.

Game Crash Analytics: What Metrics Actually Matter

Quick answer: A crash-free session rate above 99.5% is considered good for a shipped game. Above 99% is acceptable during early access. Below 99% indicates a serious stability problem that will generate negative reviews. Mobile games should target 99.8% or higher due to stricter platform expectations.

Tracking the right game crash analytics what metrics matter helps you measure what matters. Your game is live and crash reports are flowing in. Your dashboard shows total crash count, crash reports per day, unique crash types, affected platforms, and a dozen other numbers. Most of them are noise. A few of them are the only numbers that matter. The difference between a team that ships stable games and a team that chases its tail on crashes is knowing which metrics drive decisions and which are just data.

The Only Metric That Matters First: Crash-Free Session Rate

Crash-free session rate is the percentage of game sessions that complete without a crash. It is the single most important stability metric for a shipped game. Everything else is supplementary.

How to calculate it: Take the number of sessions that did not crash, divide by the total number of sessions, multiply by 100. A session starts when the game launches and ends when the player quits normally or the game crashes.

What is good? For a shipped game on PC or console, target 99.5% or higher. During early access, 99% is acceptable. Below 99% means more than 1 in 100 play sessions crash, which generates negative reviews and refund requests. Mobile games should target 99.8% because mobile platforms have stricter stability expectations and app store ratings punish crashes more aggressively.

Why not total crash count? Total crash count is misleading. If your game goes on sale and player count doubles, total crashes double even if your crash rate is unchanged. Total count goes up when things are going well (more players) and down when things are going badly (players leaving). Crash-free rate normalizes for player count and tells you the actual stability of the game.

// Pseudocode for calculating crash-free session rate
func CrashFreeRate(totalSessions int, crashedSessions int) float64 {
    if totalSessions == 0 {
        return 100.0
    }
    healthySessions := totalSessions - crashedSessions
    return (float64(healthySessions) / float64(totalSessions)) * 100.0
}

// Alert if rate drops below threshold
func CheckStability(rate float64, threshold float64) {
    if rate < threshold {
        SendAlert(fmt.Sprintf(
            "Crash-free rate dropped to %.2f%% (threshold: %.2f%%)",
            rate, threshold))
    }
}

Calculate this metric per day and per version. A daily view shows you trends. A per-version view shows you regressions. Both are essential.

Crash Clustering by Stack Trace

Raw crash reports are useless in volume. If you have 500 crash reports, you might have 500 instances of the same bug or 500 unique bugs. Without grouping, you cannot tell. Crash clustering takes individual reports and groups them by their stack trace signature to identify unique crash issues.

Stack trace signature is typically the top 3-5 frames of the call stack, normalized to remove memory addresses, thread IDs, and other variable data. Two crashes with the same top frames are almost always the same bug, even if they came from different players on different hardware.

For each crash group, track:

Occurrence count. How many times has this crash been reported? This tells you how common it is.

Affected users. How many unique players have hit this crash? One player crashing 50 times is different from 50 players crashing once each. The latter is more urgent because it means the crash is broadly reproducible.

First seen / last seen. When did this crash first appear? If it appeared in the latest version, it is a regression. If it has been around for months, it is a longstanding issue that may be hard to fix but is not getting worse.

Version distribution. Which game versions produce this crash? If it only appears in the latest version, it is a regression. If it appears across all versions, it is a systemic issue.

Focus on the top 5 crash groups by affected user count. Fixing the top 5 crashes typically improves your crash-free rate by 60-80%. The long tail of rare crashes is important but should not consume your time while the top crashes remain unfixed.

Mean Time to Resolution (MTTR)

MTTR measures how long it takes from when a crash is first reported to when a fix ships to players. This is a process metric, not a stability metric. It tells you how efficient your team is at responding to crashes.

How to calculate: For each crash group that has been resolved, measure the time between the first report timestamp and the timestamp of the release that contains the fix. Average across all resolved crash groups.

What is good? For critical crashes (top 3 by affected users), target MTTR under 72 hours. For the remaining top 10, target under 1 week. For lower-frequency crashes, MTTR of 2-4 weeks is acceptable. These numbers assume a team that ships regular patches.

MTTR is most useful as a trend. If your MTTR is increasing over time, your team is getting slower at responding to crashes. This usually means the crashes are getting harder (deeper, systemic issues) or the team is overloaded with other work. Either way, it is a warning sign.

Track MTTR separately for regressions versus longstanding crashes. Regressions should have a shorter MTTR because they are caused by recent changes that your team is familiar with. A regression with a high MTTR suggests poor testing or poor deployment practices.

Version-Over-Version Regression Detection

The most actionable crash analytics insight is regression detection: did the latest version make stability worse? This should be automated. Every time you ship a new version, your analytics should compare its crash-free rate to the previous version and alert you if it dropped.

Comparison window. Compare the first 48 hours of the new version to the same time window of the previous version. This normalizes for player behavior patterns (weekday vs weekend, launch spike vs steady state).

Regression threshold. A drop of 0.5 percentage points or more is a regression worth investigating. A drop from 99.5% to 99.0% means your crash rate doubled. That sounds small in percentage terms but it means twice as many sessions are crashing.

New crash signatures. Any crash signature that appears in the top 10 for the new version but was not present in the previous version's top 50 is a regression candidate. It may be a new bug introduced by the update, or it may be a rare existing bug that became more common due to changes in code paths.

// Regression detection logic
func DetectRegression(prevRate float64, currRate float64, threshold float64) bool {
    drop := prevRate - currRate
    if drop > threshold {
        SendAlert(fmt.Sprintf(
            "Regression detected: crash-free rate dropped by %.2f%% (%.2f%% -> %.2f%%)",
            drop, prevRate, currRate))
        return true
    }
    return false
}

// Check for new crash signatures in the latest version
func FindNewCrashSignatures(prevTop []string, currTop []string) []string {
    prevSet := toSet(prevTop)
    var newSigs []string
    for _, sig := range currTop {
        if !prevSet[sig] {
            newSigs = append(newSigs, sig)
        }
    }
    return newSigs
}

Automate this comparison so it runs every time a new version accumulates enough sessions for a statistically meaningful comparison (typically 1000+ sessions). Do not wait for manual review — by the time someone checks, the damage is done.

Alert Thresholds and Noise Reduction

The biggest risk with crash analytics is alert fatigue. If every minor fluctuation triggers a notification, your team will start ignoring alerts. Set thresholds that distinguish signal from noise.

Immediate alert (P0): Crash-free rate drops below 95%. This is a severe stability crisis — 1 in 20 sessions is crashing. Drop everything and investigate.

Urgent alert (P1): Crash-free rate drops below 99% or drops by more than 1 percentage point from the previous day. This needs attention within hours but is not an all-hands emergency.

Warning (P2): A new crash signature enters the top 5 or an existing top-5 crash's frequency increases by more than 50% day-over-day. This needs investigation within 24 hours.

Informational: Daily summary of crash-free rate, top crashes, and any version comparison results. Delivered once per day as a digest, not as real-time alerts.

Do not alert on total crash count, individual crash reports, or crashes from very old game versions. These generate noise without actionable signal. Focus alerts on rate changes, regressions, and top crash movements.

Metrics That Are Noise

Not every number in your crash dashboard is worth tracking. These metrics sound useful but typically do not drive decisions:

Total crash count. Grows with player count. Not actionable without normalization.

Unique crash types. Every game has hundreds of unique crash signatures. The absolute number is meaningless. What matters is the top 5-10 and how they change over time.

Crash rate by platform. Unless you are actively choosing which platforms to support, knowing that Linux has a higher crash rate than Windows does not change your priorities. The top crashes on each platform are what you fix.

Average crash count per user. A few power users will crash repeatedly on the same bug, skewing this metric. Affected user count per crash group is more useful.

Related Issues

For collecting the device info that makes crash grouping effective, see collecting player device info. To get crash alerts delivered to your team's Discord, check Discord webhooks for bug notifications. For prioritizing which crashes to fix during early access, read prioritizing bugs during early access.

Crash-free session rate is the only metric that goes in your executive summary. Everything else is operational detail.