Quick answer: Establish baseline metrics (average frame time, 95th-percentile frame time, peak memory, load times), record them per build in CI, and set threshold alerts (e.g. average frame time increased more than 5ms). Use git bisect to find the offending commit. Don’t rely on player-reported “lag”—by the time players are complaining, the regression has already shipped.

A performance regression is a special kind of bug. It doesn’t crash, it doesn’t throw an error, and it often doesn’t show up in playtesting because developers run on faster hardware than their players. It just quietly makes the game worse with each update until a player notices their 60fps game now runs at 45fps and leaves a review saying it used to run fine. The only reliable way to catch performance regressions is to measure proactively, per build, before you ship.

Establishing Baseline Metrics

Before you can detect a regression, you need a baseline to regress from. The metrics worth tracking are:

The key constraint is reproducibility. Your benchmark must run the same scenario every time to produce comparable numbers. The standard approach is an automated benchmark scene—a scripted fly-through of a representative area of your game—that runs headlessly and records metrics to a file.

Writing an Automated Benchmark Scene in Unity

Unity’s Performance Testing package (com.unity.test-framework.performance) integrates with the test runner to record metrics and compare against baselines.

using NUnit.Framework;
using Unity.PerformanceTesting;
using UnityEngine.TestTools;
using System.Collections;

public class FrameTimeBenchmarks
{
    [UnityTest, Performance]
    public IEnumerator BenchmarkMainLevel_FrameTime()
    {
        // Load the benchmark scene
        yield return SceneManager.LoadSceneAsync("BenchmarkLevel_Main");
        yield return null; // Wait one frame for scene to settle

        // Warm up for 60 frames before recording
        for (int i = 0; i < 60; i++) yield return null;

        // Record 300 frames of frame time data
        using (Measure.Frames()
            .WarmupCount(10)
            .MeasurementCount(300)
            .SampleGroup("FrameTime", SampleUnit.Millisecond)
            .Scope())
        {
            yield return null;
        }
    }

    [UnityTest, Performance]
    public IEnumerator BenchmarkMainLevel_MemoryPeak()
    {
        yield return SceneManager.LoadSceneAsync("BenchmarkLevel_Main");
        yield return null;

        long peakMB = Profiler.GetTotalAllocatedMemoryLong() / (1024L * 1024L);
        Measure.Custom("PeakMemory", peakMB, SampleUnit.Megabyte);

        yield return null;
    }
}

Run these tests with -testPlatform StandaloneWindows64 -batchmode in CI to get headless measurements. The Performance Testing package writes results to a JSON file you can store as a CI artifact.

Benchmarking in Godot

Godot doesn’t have an equivalent of Unity’s Performance Testing package, but you can build a lightweight benchmark runner in GDScript:

# benchmark_runner.gd - attach to an autoload node
extends Node

const WARMUP_FRAMES := 60
const MEASURE_FRAMES := 300
const OUTPUT_PATH := "user://benchmark_results.json"

var frame_times: Array[float] = []
var frame_count := 0
var is_benchmarking := false

func _ready() -> void:
    if OS.get_cmdline_args().has("--benchmark"):
        start_benchmark()

func start_benchmark() -> void:
    get_tree().change_scene_to_file("res://levels/benchmark_level.tscn")
    await get_tree().scene_changed
    is_benchmarking = true
    frame_count = 0

func _process(delta: float) -> void:
    if not is_benchmarking:
        return
    frame_count += 1
    if frame_count <= WARMUP_FRAMES:
        return  # Discard warmup frames
    frame_times.append(delta * 1000.0)  # Convert to ms
    if frame_times.size() >= MEASURE_FRAMES:
        save_results()
        get_tree().quit()

func save_results() -> void:
    frame_times.sort()
    var avg := frame_times.reduce(func(a, b): return a + b) / frame_times.size()
    var p95 := frame_times[int(frame_times.size() * 0.95)]
    var results := {
        "build": ProjectSettings.get_setting("application/config/version"),
        "avg_frame_ms": avg,
        "p95_frame_ms": p95,
        "peak_memory_mb": OS.get_static_memory_peak_usage() / (1024.0 * 1024.0)
    }
    var file := FileAccess.open(OUTPUT_PATH, FileAccess.WRITE)
    file.store_string(JSON.stringify(results, "\t"))

Run this headlessly in CI with godot --headless --benchmark res://project.godot and capture the output JSON as a build artifact.

Recording Results Per Build and Detecting Regressions

Collecting benchmarks is only useful if you compare them over time. A simple shell script in your CI pipeline can do this comparison:

#!/bin/bash
# compare_benchmarks.sh
# Compares current build results against the stored baseline.

CURRENT="benchmark_results.json"
BASELINE="benchmark_baseline.json"
AVG_THRESHOLD_MS=5   # Alert if average frame time increases more than 5ms
MEM_THRESHOLD_MB=50  # Alert if peak memory increases more than 50MB

current_avg=$(jq '.avg_frame_ms' "$CURRENT")
baseline_avg=$(jq '.avg_frame_ms' "$BASELINE")
delta_avg=$(echo "$current_avg - $baseline_avg" | bc)

current_mem=$(jq '.peak_memory_mb' "$CURRENT")
baseline_mem=$(jq '.peak_memory_mb' "$BASELINE")
delta_mem=$(echo "$current_mem - $baseline_mem" | bc)

echo "Frame time: baseline=${baseline_avg}ms current=${current_avg}ms delta=${delta_avg}ms"
echo "Peak memory: baseline=${baseline_mem}MB current=${current_mem}MB delta=${delta_mem}MB"

REGRESSION=0
if (( $(echo "$delta_avg > $AVG_THRESHOLD_MS" | bc -l) )); then
    echo "REGRESSION: Average frame time increased by ${delta_avg}ms (threshold: ${AVG_THRESHOLD_MS}ms)"
    REGRESSION=1
fi
if (( $(echo "$delta_mem > $MEM_THRESHOLD_MB" | bc -l) )); then
    echo "REGRESSION: Peak memory increased by ${delta_mem}MB (threshold: ${MEM_THRESHOLD_MB}MB)"
    REGRESSION=1
fi

exit $REGRESSION

A non-zero exit code fails the CI build, preventing a performance regression from silently shipping. The thresholds (5ms frame time, 50MB memory) are calibrated to catch meaningful regressions without flagging normal benchmark variance.

GPU vs. CPU Performance Regressions

Not all performance regressions look the same. Distinguishing GPU from CPU regressions is essential because the fix is different.

Your profiler (Unity Profiler or Godot’s performance monitors) will show CPU and GPU timelines separately. If the CPU timeline is flat but the GPU timeline grows, the regression is in rendering. If both grow, the CPU change is driving more draw calls or more GPU work.

Using git bisect to Find the Offending Commit

Once you know a regression exists between two builds, git bisect is the fastest way to find the exact commit that introduced it. This works even if there are hundreds of commits between the known-good and known-bad builds.

# Start bisect, marking current HEAD as bad
git bisect start
git bisect bad

# Mark the last known-good build commit as good
git bisect good v1.3.1

# Git checks out the midpoint commit.
# Build and run your benchmark, then:
git bisect good   # If performance is acceptable
git bisect bad    # If performance is regressed

# Repeat 6-10 times until git says:
# "abc1234 is the first bad commit"

# When done:
git bisect reset

If your benchmark is fast enough to run in a CI container (under 5 minutes), you can automate the bisect entirely with git bisect run ./run_benchmark.sh, where the script exits 0 for good performance and 1 for a regression. Git bisect will run the script at each midpoint commit and complete the search without manual intervention.

Player-Reported Lag vs. Proactive Measurement

Player-reported “lag” is a lagging indicator in every sense of the word. By the time a player reports it, the regression has already shipped, has already affected every player on that version, and may have already generated refund requests or negative reviews. “It used to run fine” is the most common and least actionable bug report an indie developer receives.

Proactive measurement inverts this. You catch the regression in CI before it ships. You know exactly which build introduced it and approximately which commit caused it. You fix it before any player sees it.

For teams without a dedicated QA function—which is most indie studios—automated benchmarks in CI are the closest thing to a performance tester you can have. The setup investment is a few hours. The payoff is catching performance regressions before your players do, every build, indefinitely.

Bugnet’s game health dashboard complements this by showing player-reported performance data (frame time reported by the SDK running in players’ sessions) alongside your crash trends. When a CI benchmark flags a regression, you can correlate it with actual player hardware data to understand which devices will be most affected before you ship.

“The moment you make ‘did this build make performance worse?’ an automated question with a yes/no answer in CI, performance regressions stop being post-launch fire drills and start being routine pre-ship fixes.”

Start with just one metric: average frame time on your most performance-intensive scene. Record it per build, compare to the previous build. You don’t need a perfect system on day one—you just need a number to compare.