What metrics should I track for game performance regressions?

The most actionable baseline metrics are: average frame time (ms) and 95th-percentile frame time (to catch hitches that averages hide), peak memory usage (MB), scene load time (ms), and time-to-first-frame. Frame time in milliseconds is more useful than FPS because it scales linearly—a 1ms regression is a 1ms regression regardless of target framerate.

How do I detect a performance regression automatically?

Record benchmark results per build (build number and git hash) and compare each new result against the rolling baseline. A regression alert threshold of 5ms average frame time increase or 50MB memory increase catches meaningful regressions without generating too many false positives. Store results in a JSON file per build or report them to a performance monitoring service.

What is the difference between a CPU and GPU performance regression?

A CPU regression shows up as increased frame time on the CPU thread—typically caused by added logic, more expensive physics, extra draw calls, or GC pressure from new allocations. A GPU regression shows up as GPU frame time increasing while CPU time stays flat—typically caused by added shader complexity, more vertices/triangles, higher-resolution render targets, or new post-processing effects. Distinguishing them requires a profiler that separates CPU and GPU timelines.

How do I use git bisect to find a performance regression?

Run git bisect start, mark the current commit as bad (git bisect bad), and mark the last known-good commit as good (git bisect good ). Git will check out the midpoint commit. Run your benchmark. If performance is good, run git bisect good; if bad, run git bisect bad. Repeat until git identifies the first bad commit. This binary search typically finds the regression commit in 6–10 iterations across hundreds of commits.

Is player-reported lag a reliable signal for performance regressions?

Player-reported lag is a lagging indicator—it tells you a regression has already reached players, which is too late. It is also imprecise: players report lag for many reasons including their own hardware load, network issues in multiplayer, and personal sensitivity to frame rate. Proactive automated benchmarks recorded per build catch regressions before they ship and give you specific numbers rather than subjective impressions.

Tracking Performance Regressions Across Game Builds

Quick answer: Establish baseline metrics (average frame time, 95th-percentile frame time, peak memory, load times), record them per build in CI, and set threshold alerts (e.g. average frame time increased more than 5ms). Use git bisect to find the offending commit. Don’t rely on player-reported “lag”—by the time players are complaining, the regression has already shipped.

A performance regression is a special kind of bug. It doesn’t crash, it doesn’t throw an error, and it often doesn’t show up in playtesting because developers run on faster hardware than their players. It just quietly makes the game worse with each update until a player notices their 60fps game now runs at 45fps and leaves a review saying it used to run fine. The only reliable way to catch performance regressions is to measure proactively, per build, before you ship.

Establishing Baseline Metrics

Before you can detect a regression, you need a baseline to regress from. The metrics worth tracking are:

Average frame time (ms): More useful than FPS because it scales linearly. Going from 16ms to 17ms (a 1ms regression) is the same absolute cost regardless of whether your target is 60fps or 120fps.
95th-percentile frame time: Average frame time hides hitches. If 95% of your frames are 16ms but 5% are 80ms, players will notice the stuttering even though the average looks fine. P95 frame time catches these spikes.
Peak memory usage (MB): Recorded at a fixed point in a reproducible benchmark scene—typically the most memory-intensive level in your game.
Scene load time (ms): Time from load request to first frame of gameplay. This degrades silently as you add assets.
Time-to-first-frame (TTFF): Time from game launch to the main menu being interactive. Critical for player retention on first launch.

The key constraint is reproducibility. Your benchmark must run the same scenario every time to produce comparable numbers. The standard approach is an automated benchmark scene—a scripted fly-through of a representative area of your game—that runs headlessly and records metrics to a file.

Writing an Automated Benchmark Scene in Unity

Unity’s Performance Testing package (com.unity.test-framework.performance) integrates with the test runner to record metrics and compare against baselines.

using NUnit.Framework;
using Unity.PerformanceTesting;
using UnityEngine.TestTools;
using System.Collections;

public class FrameTimeBenchmarks
{
    [UnityTest, Performance]
    public IEnumerator BenchmarkMainLevel_FrameTime()
    {
        // Load the benchmark scene
        yield return SceneManager.LoadSceneAsync("BenchmarkLevel_Main");
        yield return null; // Wait one frame for scene to settle

        // Warm up for 60 frames before recording
        for (int i = 0; i < 60; i++) yield return null;

        // Record 300 frames of frame time data
        using (Measure.Frames()
            .WarmupCount(10)
            .MeasurementCount(300)
            .SampleGroup("FrameTime", SampleUnit.Millisecond)
            .Scope())
        {
            yield return null;
        }
    }

    [UnityTest, Performance]
    public IEnumerator BenchmarkMainLevel_MemoryPeak()
    {
        yield return SceneManager.LoadSceneAsync("BenchmarkLevel_Main");
        yield return null;

        long peakMB = Profiler.GetTotalAllocatedMemoryLong() / (1024L * 1024L);
        Measure.Custom("PeakMemory", peakMB, SampleUnit.Megabyte);

        yield return null;
    }
}

Run these tests with -testPlatform StandaloneWindows64 -batchmode in CI to get headless measurements. The Performance Testing package writes results to a JSON file you can store as a CI artifact.

Benchmarking in Godot

Godot doesn’t have an equivalent of Unity’s Performance Testing package, but you can build a lightweight benchmark runner in GDScript:

# benchmark_runner.gd - attach to an autoload node
extends Node

const WARMUP_FRAMES := 60
const MEASURE_FRAMES := 300
const OUTPUT_PATH := "user://benchmark_results.json"

var frame_times: Array[float] = []
var frame_count := 0
var is_benchmarking := false

func _ready() -> void:
    if OS.get_cmdline_args().has("--benchmark"):
        start_benchmark()

func start_benchmark() -> void:
    get_tree().change_scene_to_file("res://levels/benchmark_level.tscn")
    await get_tree().scene_changed
    is_benchmarking = true
    frame_count = 0

func _process(delta: float) -> void:
    if not is_benchmarking:
        return
    frame_count += 1
    if frame_count <= WARMUP_FRAMES:
        return  # Discard warmup frames
    frame_times.append(delta * 1000.0)  # Convert to ms
    if frame_times.size() >= MEASURE_FRAMES:
        save_results()
        get_tree().quit()

func save_results() -> void:
    frame_times.sort()
    var avg := frame_times.reduce(func(a, b): return a + b) / frame_times.size()
    var p95 := frame_times[int(frame_times.size() * 0.95)]
    var results := {
        "build": ProjectSettings.get_setting("application/config/version"),
        "avg_frame_ms": avg,
        "p95_frame_ms": p95,
        "peak_memory_mb": OS.get_static_memory_peak_usage() / (1024.0 * 1024.0)
    }
    var file := FileAccess.open(OUTPUT_PATH, FileAccess.WRITE)
    file.store_string(JSON.stringify(results, "\t"))

Run this headlessly in CI with godot --headless --benchmark res://project.godot and capture the output JSON as a build artifact.

Recording Results Per Build and Detecting Regressions

Collecting benchmarks is only useful if you compare them over time. A simple shell script in your CI pipeline can do this comparison:

#!/bin/bash
# compare_benchmarks.sh
# Compares current build results against the stored baseline.

CURRENT="benchmark_results.json"
BASELINE="benchmark_baseline.json"
AVG_THRESHOLD_MS=5   # Alert if average frame time increases more than 5ms
MEM_THRESHOLD_MB=50  # Alert if peak memory increases more than 50MB

current_avg=$(jq '.avg_frame_ms' "$CURRENT")
baseline_avg=$(jq '.avg_frame_ms' "$BASELINE")
delta_avg=$(echo "$current_avg - $baseline_avg" | bc)

current_mem=$(jq '.peak_memory_mb' "$CURRENT")
baseline_mem=$(jq '.peak_memory_mb' "$BASELINE")
delta_mem=$(echo "$current_mem - $baseline_mem" | bc)

echo "Frame time: baseline=${baseline_avg}ms current=${current_avg}ms delta=${delta_avg}ms"
echo "Peak memory: baseline=${baseline_mem}MB current=${current_mem}MB delta=${delta_mem}MB"

REGRESSION=0
if (( $(echo "$delta_avg > $AVG_THRESHOLD_MS" | bc -l) )); then
    echo "REGRESSION: Average frame time increased by ${delta_avg}ms (threshold: ${AVG_THRESHOLD_MS}ms)"
    REGRESSION=1
fi
if (( $(echo "$delta_mem > $MEM_THRESHOLD_MB" | bc -l) )); then
    echo "REGRESSION: Peak memory increased by ${delta_mem}MB (threshold: ${MEM_THRESHOLD_MB}MB)"
    REGRESSION=1
fi

exit $REGRESSION

A non-zero exit code fails the CI build, preventing a performance regression from silently shipping. The thresholds (5ms frame time, 50MB memory) are calibrated to catch meaningful regressions without flagging normal benchmark variance.

GPU vs. CPU Performance Regressions

Not all performance regressions look the same. Distinguishing GPU from CPU regressions is essential because the fix is different.

CPU regression: Frame time increases on the main thread. Look at script execution time, physics, AI, pathfinding, garbage collection. Common causes: a new Update() loop added to many objects, an O(n²) operation that was harmless at small n but now runs on large collections, excessive allocations causing GC pressure.
GPU regression: GPU frame time increases while CPU time stays flat. Look at draw call count, triangle count, shader complexity, render target resolution, post-processing effects. Common causes: a new particle system with high emitter count, a new post-processing pass, an unoptimized shader on a frequently rendered mesh, shadows enabled on a light that covers the entire scene.

Your profiler (Unity Profiler or Godot’s performance monitors) will show CPU and GPU timelines separately. If the CPU timeline is flat but the GPU timeline grows, the regression is in rendering. If both grow, the CPU change is driving more draw calls or more GPU work.

Using git bisect to Find the Offending Commit

Once you know a regression exists between two builds, git bisect is the fastest way to find the exact commit that introduced it. This works even if there are hundreds of commits between the known-good and known-bad builds.

# Start bisect, marking current HEAD as bad
git bisect start
git bisect bad

# Mark the last known-good build commit as good
git bisect good v1.3.1

# Git checks out the midpoint commit.
# Build and run your benchmark, then:
git bisect good   # If performance is acceptable
git bisect bad    # If performance is regressed

# Repeat 6-10 times until git says:
# "abc1234 is the first bad commit"

# When done:
git bisect reset

If your benchmark is fast enough to run in a CI container (under 5 minutes), you can automate the bisect entirely with git bisect run ./run_benchmark.sh, where the script exits 0 for good performance and 1 for a regression. Git bisect will run the script at each midpoint commit and complete the search without manual intervention.

Player-Reported Lag vs. Proactive Measurement

Player-reported “lag” is a lagging indicator in every sense of the word. By the time a player reports it, the regression has already shipped, has already affected every player on that version, and may have already generated refund requests or negative reviews. “It used to run fine” is the most common and least actionable bug report an indie developer receives.

Proactive measurement inverts this. You catch the regression in CI before it ships. You know exactly which build introduced it and approximately which commit caused it. You fix it before any player sees it.

For teams without a dedicated QA function—which is most indie studios—automated benchmarks in CI are the closest thing to a performance tester you can have. The setup investment is a few hours. The payoff is catching performance regressions before your players do, every build, indefinitely.

Bugnet’s game health dashboard complements this by showing player-reported performance data (frame time reported by the SDK running in players’ sessions) alongside your crash trends. When a CI benchmark flags a regression, you can correlate it with actual player hardware data to understand which devices will be most affected before you ship.

“The moment you make ‘did this build make performance worse?’ an automated question with a yes/no answer in CI, performance regressions stop being post-launch fire drills and start being routine pre-ship fixes.”

Start with just one metric: average frame time on your most performance-intensive scene. Record it per build, compare to the previous build. You don’t need a perfect system on day one—you just need a number to compare.