Quick answer: Take the top 3–5 stack frames, normalize them (strip templates, addresses, inlined line numbers), hash the result, and use that hash as the issue ID. Store the original reports as "occurrences" linked to the issue. A report with a matching hash becomes a new occurrence on an existing issue instead of a new bug.
Ship a game with crash reporting enabled and within 24 hours you will have thousands of reports. On a bad launch day, tens of thousands. Most of them are the same five or six bugs hit by hundreds or thousands of players. If you tried to read every report individually, you would never fix anything. Deduplication is what turns a wall of noise into a prioritized list of actual bugs. Here is how to build one.
The Core Concept: Fingerprinting
A crash fingerprint is a deterministic hash computed from a crash report's identifying features. Two reports with the same fingerprint are considered the same bug; two reports with different fingerprints are different bugs. The art of deduplication is picking features that produce the same fingerprint for the same underlying cause, and different fingerprints for different causes.
The most common fingerprint is the top of the stack trace. If two players hit NullReferenceException in PlayerController.Update() -> ApplyMovement() -> GetInput(), that is the same bug regardless of what they were doing before. If a third player hits NullReferenceException in EnemySpawner.Spawn() -> GetSpawnPoint() -> GetRandomWaypoint(), that is a different bug, and the fingerprint should reflect that.
Step 1: Define Your Stack Frame Schema
Before you can hash frames, you need a consistent representation. Define a struct (or dict, or class) that captures just the parts of a frame you care about:
type StackFrame struct {
Function string // "PlayerController.Update"
File string // "PlayerController.cs"
Line int // 142 (optional for fingerprint)
Module string // "Assembly-CSharp.dll"
}
type CrashReport struct {
ExceptionType string
Message string
Frames []StackFrame
Platform string
BuildVersion string
}
Note that Line is optional. Including line numbers in the fingerprint makes it stable within a single build version, but breaks it across builds — a one-line change above a function invalidates every fingerprint that referenced lines below it. This is the central tradeoff.
Step 2: Normalize Function Names
Raw function names from a stack trace contain noise: template parameters, lambda IDs, compiler-mangled suffixes, inline marker lines. Two runs of the same code can produce slightly different raw function names. Strip that noise before hashing:
func NormalizeFunction(f string) string {
// Remove template parameters: Array<int> -> Array
f = tmplRe.ReplaceAllString(f, "")
// Remove lambda IDs: Lambda$1234 -> Lambda
f = lambdaRe.ReplaceAllString(f, "Lambda")
// Remove inline marker: Foo [inlined] -> Foo
f = strings.TrimSuffix(f, " [inlined]")
// Remove memory addresses: Foo+0x4a -> Foo
f = addrRe.ReplaceAllString(f, "")
return strings.TrimSpace(f)
}
Whatever normalization you pick, document it and never change it silently. Every time you tweak the normalization rules, you invalidate old fingerprints and every bug in your tracker gets "reopened" as a new issue. Version your normalization function and treat changes as migrations.
Step 3: Skip Engine Noise Frames
The top of most stack traces is not the bug — it is the crash handler itself, allocator functions, or engine internals that are common to every crash. Skip them until you hit a "meaningful" frame:
var ignoreFrames = map[string]bool{
"abort": true,
"raise": true,
"__cxa_throw": true,
"UnityEngine.Debug.LogException": true,
"UE::Assert::VerifyFailed": true,
"FDebug::EnsureFailed": true,
"malloc": true,
"free": true,
"new": true,
}
func SignificantFrames(frames []StackFrame, count int) []StackFrame {
result := make([]StackFrame, 0, count)
for _, f := range frames {
if ignoreFrames[f.Function] {
continue
}
result = append(result, f)
if len(result) == count {
break
}
}
return result
}
Maintain this ignore list as a config file, not baked into code. You will be adding to it constantly as new engine versions introduce new noise frames.
Step 4: Compute the Fingerprint
With normalized, filtered frames, the fingerprint is a hash:
func Fingerprint(report CrashReport) string {
frames := SignificantFrames(report.Frames, 4)
h := sha256.New()
h.Write([]byte(report.ExceptionType))
h.Write([]byte{'|'})
for _, f := range frames {
h.Write([]byte(NormalizeFunction(f.Function)))
h.Write([]byte{'|'})
h.Write([]byte(f.Module))
h.Write([]byte{'|'})
}
return hex.EncodeToString(h.Sum(nil))[:16]
}
16 hex characters (64 bits) is more than enough to avoid collisions for any realistic number of bugs. You can display the full hash in the admin UI and use the first 8 characters as a short ID in user-facing URLs.
Step 5: Store Issues and Occurrences Separately
Two tables: one for issues (unique bugs) and one for occurrences (individual crash reports). Every incoming report becomes an occurrence linked to an issue by fingerprint.
CREATE TABLE crash_issues (
fingerprint CHAR(16) PRIMARY KEY,
first_seen TIMESTAMP NOT NULL,
last_seen TIMESTAMP NOT NULL,
occurrence_count BIGINT NOT NULL DEFAULT 0,
unique_users BIGINT NOT NULL DEFAULT 0,
representative JSON, -- one example report
status ENUM('new', 'investigating', 'fixed') DEFAULT 'new',
assignee VARCHAR(255),
notes TEXT
);
CREATE TABLE crash_occurrences (
id BIGINT PRIMARY KEY AUTO_INCREMENT,
fingerprint CHAR(16) NOT NULL,
build_version VARCHAR(64),
platform VARCHAR(32),
user_id_hash CHAR(16),
received_at TIMESTAMP NOT NULL,
full_payload JSON,
INDEX(fingerprint, received_at),
FOREIGN KEY(fingerprint) REFERENCES crash_issues(fingerprint)
);
When a new report arrives: compute the fingerprint, INSERT ... ON DUPLICATE KEY UPDATE the issue to bump counters, and insert the occurrence. The total cost is two queries per report, both O(1).
Step 6: Handle Fingerprint Drift
Over time, fingerprints drift. A recompiled binary changes addresses. A refactored function has a new name. A new normalization rule splits one fingerprint into two. Your system needs to handle this gracefully.
The standard technique is issue linking. When you detect that an "old" fingerprint and a "new" fingerprint are the same bug (by manual inspection or by heuristics), add an aliases column to the issues table listing equivalent fingerprints:
ALTER TABLE crash_issues ADD COLUMN aliases JSON;
Queries against the issue then check both the primary fingerprint and the alias list. This lets you keep a single bug in the tracker while tolerating fingerprint changes.
Step 7: Surface Useful Metadata
Deduplication is the start, not the end. Once you have issues, you need to surface the features that make triage fast:
- Occurrence count: how many times has this crashed?
- Unique users: how many distinct players? (A million reports from one player is very different from ten reports from a million players.)
- First and last seen: regressed recently or ancient?
- Affected builds: which versions are crashing? Fixed in the latest?
- Affected platforms: Windows only, or all platforms?
- Representative report: one full payload you can inspect for logs, screenshots, and device info
Sort your issues view by (unique_users * recency) and you have a priority queue for the engineer who will fix them.
"Before deduplication, crash reports felt like trying to drink from a firehose. After deduplication, they felt like a to-do list. It is the single highest-leverage piece of infrastructure a live game can have."
Related Issues
For the stack trace normalization techniques in more detail see stack trace grouping and crash deduplication. For how to symbolicate native stack traces before fingerprinting see capture and symbolicate crash dumps.
Fingerprint the top frames, not the whole stack. Keep your ignore list in config. Version everything.