Quick answer: Establish baseline metrics (average frame time, 95th-percentile frame time, peak memory, load times), record them per build in CI, and set threshold alerts (e.g. average frame time increased more than 5ms). Use git bisect to find the offending commit. Don’t rely on player-reported “lag”—by the time players are complaining, the regression has already shipped.
A performance regression is a special kind of bug. It doesn’t crash, it doesn’t throw an error, and it often doesn’t show up in playtesting because developers run on faster hardware than their players. It just quietly makes the game worse with each update until a player notices their 60fps game now runs at 45fps and leaves a review saying it used to run fine. The only reliable way to catch performance regressions is to measure proactively, per build, before you ship.
Establishing Baseline Metrics
Before you can detect a regression, you need a baseline to regress from. The metrics worth tracking are:
- Average frame time (ms): More useful than FPS because it scales linearly. Going from 16ms to 17ms (a 1ms regression) is the same absolute cost regardless of whether your target is 60fps or 120fps.
- 95th-percentile frame time: Average frame time hides hitches. If 95% of your frames are 16ms but 5% are 80ms, players will notice the stuttering even though the average looks fine. P95 frame time catches these spikes.
- Peak memory usage (MB): Recorded at a fixed point in a reproducible benchmark scene—typically the most memory-intensive level in your game.
- Scene load time (ms): Time from load request to first frame of gameplay. This degrades silently as you add assets.
- Time-to-first-frame (TTFF): Time from game launch to the main menu being interactive. Critical for player retention on first launch.
The key constraint is reproducibility. Your benchmark must run the same scenario every time to produce comparable numbers. The standard approach is an automated benchmark scene—a scripted fly-through of a representative area of your game—that runs headlessly and records metrics to a file.
Writing an Automated Benchmark Scene in Unity
Unity’s Performance Testing package (com.unity.test-framework.performance) integrates with the test runner to record metrics and compare against baselines.
using NUnit.Framework;
using Unity.PerformanceTesting;
using UnityEngine.TestTools;
using System.Collections;
public class FrameTimeBenchmarks
{
[UnityTest, Performance]
public IEnumerator BenchmarkMainLevel_FrameTime()
{
// Load the benchmark scene
yield return SceneManager.LoadSceneAsync("BenchmarkLevel_Main");
yield return null; // Wait one frame for scene to settle
// Warm up for 60 frames before recording
for (int i = 0; i < 60; i++) yield return null;
// Record 300 frames of frame time data
using (Measure.Frames()
.WarmupCount(10)
.MeasurementCount(300)
.SampleGroup("FrameTime", SampleUnit.Millisecond)
.Scope())
{
yield return null;
}
}
[UnityTest, Performance]
public IEnumerator BenchmarkMainLevel_MemoryPeak()
{
yield return SceneManager.LoadSceneAsync("BenchmarkLevel_Main");
yield return null;
long peakMB = Profiler.GetTotalAllocatedMemoryLong() / (1024L * 1024L);
Measure.Custom("PeakMemory", peakMB, SampleUnit.Megabyte);
yield return null;
}
}
Run these tests with -testPlatform StandaloneWindows64 -batchmode in CI to get headless measurements. The Performance Testing package writes results to a JSON file you can store as a CI artifact.
Benchmarking in Godot
Godot doesn’t have an equivalent of Unity’s Performance Testing package, but you can build a lightweight benchmark runner in GDScript:
# benchmark_runner.gd - attach to an autoload node
extends Node
const WARMUP_FRAMES := 60
const MEASURE_FRAMES := 300
const OUTPUT_PATH := "user://benchmark_results.json"
var frame_times: Array[float] = []
var frame_count := 0
var is_benchmarking := false
func _ready() -> void:
if OS.get_cmdline_args().has("--benchmark"):
start_benchmark()
func start_benchmark() -> void:
get_tree().change_scene_to_file("res://levels/benchmark_level.tscn")
await get_tree().scene_changed
is_benchmarking = true
frame_count = 0
func _process(delta: float) -> void:
if not is_benchmarking:
return
frame_count += 1
if frame_count <= WARMUP_FRAMES:
return # Discard warmup frames
frame_times.append(delta * 1000.0) # Convert to ms
if frame_times.size() >= MEASURE_FRAMES:
save_results()
get_tree().quit()
func save_results() -> void:
frame_times.sort()
var avg := frame_times.reduce(func(a, b): return a + b) / frame_times.size()
var p95 := frame_times[int(frame_times.size() * 0.95)]
var results := {
"build": ProjectSettings.get_setting("application/config/version"),
"avg_frame_ms": avg,
"p95_frame_ms": p95,
"peak_memory_mb": OS.get_static_memory_peak_usage() / (1024.0 * 1024.0)
}
var file := FileAccess.open(OUTPUT_PATH, FileAccess.WRITE)
file.store_string(JSON.stringify(results, "\t"))
Run this headlessly in CI with godot --headless --benchmark res://project.godot and capture the output JSON as a build artifact.
Recording Results Per Build and Detecting Regressions
Collecting benchmarks is only useful if you compare them over time. A simple shell script in your CI pipeline can do this comparison:
#!/bin/bash
# compare_benchmarks.sh
# Compares current build results against the stored baseline.
CURRENT="benchmark_results.json"
BASELINE="benchmark_baseline.json"
AVG_THRESHOLD_MS=5 # Alert if average frame time increases more than 5ms
MEM_THRESHOLD_MB=50 # Alert if peak memory increases more than 50MB
current_avg=$(jq '.avg_frame_ms' "$CURRENT")
baseline_avg=$(jq '.avg_frame_ms' "$BASELINE")
delta_avg=$(echo "$current_avg - $baseline_avg" | bc)
current_mem=$(jq '.peak_memory_mb' "$CURRENT")
baseline_mem=$(jq '.peak_memory_mb' "$BASELINE")
delta_mem=$(echo "$current_mem - $baseline_mem" | bc)
echo "Frame time: baseline=${baseline_avg}ms current=${current_avg}ms delta=${delta_avg}ms"
echo "Peak memory: baseline=${baseline_mem}MB current=${current_mem}MB delta=${delta_mem}MB"
REGRESSION=0
if (( $(echo "$delta_avg > $AVG_THRESHOLD_MS" | bc -l) )); then
echo "REGRESSION: Average frame time increased by ${delta_avg}ms (threshold: ${AVG_THRESHOLD_MS}ms)"
REGRESSION=1
fi
if (( $(echo "$delta_mem > $MEM_THRESHOLD_MB" | bc -l) )); then
echo "REGRESSION: Peak memory increased by ${delta_mem}MB (threshold: ${MEM_THRESHOLD_MB}MB)"
REGRESSION=1
fi
exit $REGRESSION
A non-zero exit code fails the CI build, preventing a performance regression from silently shipping. The thresholds (5ms frame time, 50MB memory) are calibrated to catch meaningful regressions without flagging normal benchmark variance.
GPU vs. CPU Performance Regressions
Not all performance regressions look the same. Distinguishing GPU from CPU regressions is essential because the fix is different.
- CPU regression: Frame time increases on the main thread. Look at script execution time, physics, AI, pathfinding, garbage collection. Common causes: a new Update() loop added to many objects, an O(n²) operation that was harmless at small n but now runs on large collections, excessive allocations causing GC pressure.
- GPU regression: GPU frame time increases while CPU time stays flat. Look at draw call count, triangle count, shader complexity, render target resolution, post-processing effects. Common causes: a new particle system with high emitter count, a new post-processing pass, an unoptimized shader on a frequently rendered mesh, shadows enabled on a light that covers the entire scene.
Your profiler (Unity Profiler or Godot’s performance monitors) will show CPU and GPU timelines separately. If the CPU timeline is flat but the GPU timeline grows, the regression is in rendering. If both grow, the CPU change is driving more draw calls or more GPU work.
Using git bisect to Find the Offending Commit
Once you know a regression exists between two builds, git bisect is the fastest way to find the exact commit that introduced it. This works even if there are hundreds of commits between the known-good and known-bad builds.
# Start bisect, marking current HEAD as bad
git bisect start
git bisect bad
# Mark the last known-good build commit as good
git bisect good v1.3.1
# Git checks out the midpoint commit.
# Build and run your benchmark, then:
git bisect good # If performance is acceptable
git bisect bad # If performance is regressed
# Repeat 6-10 times until git says:
# "abc1234 is the first bad commit"
# When done:
git bisect reset
If your benchmark is fast enough to run in a CI container (under 5 minutes), you can automate the bisect entirely with git bisect run ./run_benchmark.sh, where the script exits 0 for good performance and 1 for a regression. Git bisect will run the script at each midpoint commit and complete the search without manual intervention.
Player-Reported Lag vs. Proactive Measurement
Player-reported “lag” is a lagging indicator in every sense of the word. By the time a player reports it, the regression has already shipped, has already affected every player on that version, and may have already generated refund requests or negative reviews. “It used to run fine” is the most common and least actionable bug report an indie developer receives.
Proactive measurement inverts this. You catch the regression in CI before it ships. You know exactly which build introduced it and approximately which commit caused it. You fix it before any player sees it.
For teams without a dedicated QA function—which is most indie studios—automated benchmarks in CI are the closest thing to a performance tester you can have. The setup investment is a few hours. The payoff is catching performance regressions before your players do, every build, indefinitely.
Bugnet’s game health dashboard complements this by showing player-reported performance data (frame time reported by the SDK running in players’ sessions) alongside your crash trends. When a CI benchmark flags a regression, you can correlate it with actual player hardware data to understand which devices will be most affected before you ship.
Start with just one metric: average frame time on your most performance-intensive scene. Record it per build, compare to the previous build. You don’t need a perfect system on day one—you just need a number to compare.“The moment you make ‘did this build make performance worse?’ an automated question with a yes/no answer in CI, performance regressions stop being post-launch fire drills and start being routine pre-ship fixes.”