How do crash reporters deduplicate thousands of reports?

They compute a fingerprint from the top 3-5 stack frames, normalize function names to remove template parameters and addresses, and hash the result. Reports with matching fingerprints are grouped as 'occurrences' under a single issue. The hard part is making fingerprints stable across build versions so a recompile doesn't invalidate your entire crash history.

What stack frames should I use for the fingerprint?

Use the top 3-5 frames after skipping engine noise like malloc, new, Array::Add, and the crash handler itself. Normalize function names to strip template arguments, lambdas, inlined lines, and memory addresses. The goal is: the same bug produces the same fingerprint regardless of when it was captured or which player hit it.

Should I group Unity crashes by exception type or stack trace?

By stack trace, with the exception type as a secondary key. NullReferenceException in PlayerController.Update() is a different bug than NullReferenceException in EnemySpawner.Spawn(), even though the exception class is identical. Grouping by exception type alone creates one giant 'NullReference' bucket that is useless for triage.

How to Build a Crash Report Deduplication System for Your Game

Quick answer: Take the top 3–5 stack frames, normalize them (strip templates, addresses, inlined line numbers), hash the result, and use that hash as the issue ID. Store the original reports as "occurrences" linked to the issue. A report with a matching hash becomes a new occurrence on an existing issue instead of a new bug.

Ship a game with crash reporting enabled and within 24 hours you will have thousands of reports. On a bad launch day, tens of thousands. Most of them are the same five or six bugs hit by hundreds or thousands of players. If you tried to read every report individually, you would never fix anything. Deduplication is what turns a wall of noise into a prioritized list of actual bugs. Here is how to build one.

The Core Concept: Fingerprinting

A crash fingerprint is a deterministic hash computed from a crash report's identifying features. Two reports with the same fingerprint are considered the same bug; two reports with different fingerprints are different bugs. The art of deduplication is picking features that produce the same fingerprint for the same underlying cause, and different fingerprints for different causes.

The most common fingerprint is the top of the stack trace. If two players hit NullReferenceException in PlayerController.Update() -> ApplyMovement() -> GetInput(), that is the same bug regardless of what they were doing before. If a third player hits NullReferenceException in EnemySpawner.Spawn() -> GetSpawnPoint() -> GetRandomWaypoint(), that is a different bug, and the fingerprint should reflect that.

Step 1: Define Your Stack Frame Schema

Before you can hash frames, you need a consistent representation. Define a struct (or dict, or class) that captures just the parts of a frame you care about:

type StackFrame struct {
    Function string  // "PlayerController.Update"
    File     string  // "PlayerController.cs"
    Line     int     // 142 (optional for fingerprint)
    Module   string  // "Assembly-CSharp.dll"
}

type CrashReport struct {
    ExceptionType string
    Message       string
    Frames        []StackFrame
    Platform      string
    BuildVersion  string
}

Note that Line is optional. Including line numbers in the fingerprint makes it stable within a single build version, but breaks it across builds — a one-line change above a function invalidates every fingerprint that referenced lines below it. This is the central tradeoff.

Step 2: Normalize Function Names

Raw function names from a stack trace contain noise: template parameters, lambda IDs, compiler-mangled suffixes, inline marker lines. Two runs of the same code can produce slightly different raw function names. Strip that noise before hashing:

func NormalizeFunction(f string) string {
    // Remove template parameters: Array<int> -> Array
    f = tmplRe.ReplaceAllString(f, "")
    // Remove lambda IDs: Lambda$1234 -> Lambda
    f = lambdaRe.ReplaceAllString(f, "Lambda")
    // Remove inline marker: Foo [inlined] -> Foo
    f = strings.TrimSuffix(f, " [inlined]")
    // Remove memory addresses: Foo+0x4a -> Foo
    f = addrRe.ReplaceAllString(f, "")
    return strings.TrimSpace(f)
}

Whatever normalization you pick, document it and never change it silently. Every time you tweak the normalization rules, you invalidate old fingerprints and every bug in your tracker gets "reopened" as a new issue. Version your normalization function and treat changes as migrations.

Step 3: Skip Engine Noise Frames

The top of most stack traces is not the bug — it is the crash handler itself, allocator functions, or engine internals that are common to every crash. Skip them until you hit a "meaningful" frame:

var ignoreFrames = map[string]bool{
    "abort":                          true,
    "raise":                          true,
    "__cxa_throw":                    true,
    "UnityEngine.Debug.LogException": true,
    "UE::Assert::VerifyFailed":        true,
    "FDebug::EnsureFailed":            true,
    "malloc":                         true,
    "free":                           true,
    "new":                            true,
}

func SignificantFrames(frames []StackFrame, count int) []StackFrame {
    result := make([]StackFrame, 0, count)
    for _, f := range frames {
        if ignoreFrames[f.Function] {
            continue
        }
        result = append(result, f)
        if len(result) == count {
            break
        }
    }
    return result
}

Maintain this ignore list as a config file, not baked into code. You will be adding to it constantly as new engine versions introduce new noise frames.

Step 4: Compute the Fingerprint

With normalized, filtered frames, the fingerprint is a hash:

func Fingerprint(report CrashReport) string {
    frames := SignificantFrames(report.Frames, 4)

    h := sha256.New()
    h.Write([]byte(report.ExceptionType))
    h.Write([]byte{'|'})

    for _, f := range frames {
        h.Write([]byte(NormalizeFunction(f.Function)))
        h.Write([]byte{'|'})
        h.Write([]byte(f.Module))
        h.Write([]byte{'|'})
    }

    return hex.EncodeToString(h.Sum(nil))[:16]
}

16 hex characters (64 bits) is more than enough to avoid collisions for any realistic number of bugs. You can display the full hash in the admin UI and use the first 8 characters as a short ID in user-facing URLs.

Step 5: Store Issues and Occurrences Separately

Two tables: one for issues (unique bugs) and one for occurrences (individual crash reports). Every incoming report becomes an occurrence linked to an issue by fingerprint.

CREATE TABLE crash_issues (
    fingerprint     CHAR(16) PRIMARY KEY,
    first_seen      TIMESTAMP NOT NULL,
    last_seen       TIMESTAMP NOT NULL,
    occurrence_count BIGINT NOT NULL DEFAULT 0,
    unique_users    BIGINT NOT NULL DEFAULT 0,
    representative  JSON,  -- one example report
    status          ENUM('new', 'investigating', 'fixed') DEFAULT 'new',
    assignee        VARCHAR(255),
    notes           TEXT
);

CREATE TABLE crash_occurrences (
    id              BIGINT PRIMARY KEY AUTO_INCREMENT,
    fingerprint     CHAR(16) NOT NULL,
    build_version   VARCHAR(64),
    platform        VARCHAR(32),
    user_id_hash    CHAR(16),
    received_at     TIMESTAMP NOT NULL,
    full_payload    JSON,
    INDEX(fingerprint, received_at),
    FOREIGN KEY(fingerprint) REFERENCES crash_issues(fingerprint)
);

When a new report arrives: compute the fingerprint, INSERT ... ON DUPLICATE KEY UPDATE the issue to bump counters, and insert the occurrence. The total cost is two queries per report, both O(1).

Step 6: Handle Fingerprint Drift

Over time, fingerprints drift. A recompiled binary changes addresses. A refactored function has a new name. A new normalization rule splits one fingerprint into two. Your system needs to handle this gracefully.

The standard technique is issue linking. When you detect that an "old" fingerprint and a "new" fingerprint are the same bug (by manual inspection or by heuristics), add an aliases column to the issues table listing equivalent fingerprints:

ALTER TABLE crash_issues ADD COLUMN aliases JSON;

Queries against the issue then check both the primary fingerprint and the alias list. This lets you keep a single bug in the tracker while tolerating fingerprint changes.

Step 7: Surface Useful Metadata

Deduplication is the start, not the end. Once you have issues, you need to surface the features that make triage fast:

Occurrence count: how many times has this crashed?
Unique users: how many distinct players? (A million reports from one player is very different from ten reports from a million players.)
First and last seen: regressed recently or ancient?
Affected builds: which versions are crashing? Fixed in the latest?
Affected platforms: Windows only, or all platforms?
Representative report: one full payload you can inspect for logs, screenshots, and device info

Sort your issues view by (unique_users * recency) and you have a priority queue for the engineer who will fix them.

"Before deduplication, crash reports felt like trying to drink from a firehose. After deduplication, they felt like a to-do list. It is the single highest-leverage piece of infrastructure a live game can have."

Related Issues

For the stack trace normalization techniques in more detail see stack trace grouping and crash deduplication. For how to symbolicate native stack traces before fingerprinting see capture and symbolicate crash dumps.

Fingerprint the top frames, not the whole stack. Keep your ignore list in config. Version everything.