What is a triage decision log?

A brief written record attached to every contentious triage decision: what was decided (priority, assignment, close reason), who decided, why, and when. Routine calls (obvious P1 crashes) don't need entries. Judgment calls (edge cases, closes without fix, deprioritizations) do.

Who writes the triage log?

Whoever makes the final call. The log is a single comment on the bug, not a separate document. It lives with the bug forever in the tracker. New triagers read past entries when they're unsure how to handle a similar case.

When should I NOT log a triage decision?

Routine triage (clear crash is P1, minor typo is P3) doesn't need justification. Log only non-obvious decisions: deprioritizing something that looked critical, closing without fix, reopening an old bug, changing assignment from the obvious owner. The threshold is 'would a new triager be confused why we did this?'.

How to Write a Bug Triage Decision Log

Quick answer: Log only the non-obvious triage decisions directly on the bug: decider, date, reasoning. Routine calls don’t need justification. The log answers “why did we deprioritize this?” three months later and trains new triagers through real examples.

Three months ago your team closed a bug as “won’t fix.” Today a player reports the same issue and your team has to decide whether the old decision still stands. Nobody remembers why. Without a decision log, every contentious triage call re-litigates itself indefinitely.

What to Log

Log only decisions that required judgment. Skip the obvious ones.

Log:

Deprioritizing a bug that looked critical at first read.
Closing without fix (won’t fix, works as intended, duplicate).
Reopening an old bug (what’s different now).
Reassigning away from the obvious owner.
Overriding the priority rubric.

Don’t log: routine priority assignments that match the rubric. A new P1 crash is obviously P1; the rubric justifies it.

The Format

One comment on the bug, not a separate document. Four lines:

Decision: Deprioritize from P1 to P3
Decider: Alex (team lead)
Date: 2026-04-10
Rationale: Only reproduces with a specific NVIDIA driver version
that ships tomorrow. Affected player count < 50 based on telemetry.
Not blocking launch; revisit if affected count grows.

Rationale in a few sentences is enough. The goal is that a teammate reading the bug in 6 months understands the decision without having to dig through Slack.

Why It Matters

It prevents re-litigation. A future triager sees the log and either accepts the past decision or has a specific reason to revisit.

It trains new triagers. New team members read a month of logs to absorb the team’s implicit judgment.

It creates accountability. Decisions with rationale are reviewed for quality; decisions without are lost to time.

It surfaces rubric gaps. If the same rationale appears repeatedly, the rubric should be updated to formalize it.

Quarterly Review

Every quarter, read the last quarter’s decision logs. Look for:

Inconsistencies: same class of bug, different decisions. The rubric needs clarification.
Patterns: the same rationale repeated. Promote it into the rubric.
Mistakes: bugs deprioritized that later caused problems. Learn from them.

Not a Postmortem

Decision logs are written in the moment, not after the fact. A postmortem analyzes an outcome; a decision log records a choice. Both have value; don’t conflate them.

Understanding the issue

The principle this article describes is one of those operational details that shapes team output disproportionately to its complexity. It's small enough that it's easy to skip; large enough that skipping it accumulates real cost. The teams that implement it well aren't doing anything sophisticated - they're doing the basic thing consistently.

Operational practices like this one tend to be most valuable when adopted before they're obviously needed. Studios that wait until a crisis to implement quality controls find themselves implementing under pressure, with less time to design well and more pressure to ship features. The practice ends up shaped by the crisis rather than by what would have worked best.

Why this matters

Operational quality is invisible until it isn't. Studios that don't track these metrics don't know they're missing them. The cost shows up as longer time-to-fix, higher rework rate, and engineers leaving because the work feels Sisyphean.

The practice described here has both an obvious benefit (the one in the title) and several non-obvious ones. Teams that adopt it usually notice the obvious benefit first; the non-obvious benefits surface over time as the practice composes with other team habits. This is part of why adoption is hard - the upfront benefit isn't always commensurate with the upfront cost, but the long-term return is.

Putting it into practice

Measuring whether this practice is working requires honest data, not aspirational metrics. Pick a number that actually moves when the practice is followed (cycle time, fix rate, error count) and not one that moves with general activity (total commits, total bugs filed). The first kind tells you the practice is working; the second kind just tells you the team is busy.

Adopting a practice without measurement is faith-based engineering. Measurement makes it data-driven. The first metric you pick will be wrong; that's fine. Use it for a quarter, see what it actually tells you, refine. The third or fourth iteration of the metric is when it starts to be useful.

Adapting to your context

Specific industries (mobile, console, VR, multiplayer) have their own variations on this practice. The core idea is portable; the implementation depends on the platform's constraints. Borrow from teams in your space.

Tailor this practice to your context rather than copying verbatim from another team's implementation. What's appropriate for a multiplayer-focused studio differs from what's appropriate for a narrative-focused one. The principles transfer; the specifics don't.

Long-term maintenance

When this kind of process is missing from a studio, the gap is usually invisible until someone points it out. The team that didn't realize their cycle time was 14 days finds out when they hire from a studio where it was 3. Benchmarks matter - keep some external reference for your own quality bars.

The hardest part of operational changes isn't the change - it's the ongoing maintenance. Build the maintenance into existing rhythms: a quarterly retrospective, a monthly review, a weekly check. The cadence matters because human attention drifts; structure replaces willpower with habit.

Throughput considerations

Process improvements have throughput costs too. A practice that requires every PR to be reviewed by three engineers is correct in theory and slow in practice. Pick implementations that are both correct and fast enough for your team's velocity.

How to start

Before changing how your team works, gather baseline data on the current state. Without baselines, you can't tell whether your change made things better, worse, or simply different. Even rough measurements - 'we close about 20 bugs per week, sev-1 takes about 3 days' - are valuable as starting points for comparison.

Pilot the change with a single team or a single feature before rolling it out broadly. The pilot teaches you what implementation details actually matter; the broad rollout applies what you learned. Skipping the pilot means you discover the gotchas during the rollout, which is too late to redesign the practice.

Supporting tooling

The tooling that supports this practice has a multiplicative effect. A team with a custom dashboard for the relevant metrics moves faster than a team that calculates them by hand each time. The cost of building the dashboard is paid back in months; the value is the persistent visibility it provides.

When evaluating tools to support this practice, prefer ones that integrate with what your team already uses. A purpose-built tool may have better features, but adoption depends on the team using it consistently. The integrated tool that's used 95% of the time usually beats the best-in-class tool that's used 60% of the time.

Adoption pitfalls

Adoption pitfalls vary by team. Small teams struggle with overhead; large teams struggle with consistency; distributed teams struggle with communication. Anticipate the pitfall most likely to affect your team and design around it from the start.

Watch for the pattern where the practice 'almost' works - everyone says they're following it, but the metrics don't move. This is the most common failure mode: surface compliance without underlying behavior change. The fix isn't more documentation; it's making the practice's effect visible through tooling or rituals.

Communicating the change

Onboarding new engineers to this practice takes deliberate time. Documentation is a starting point; pairing on a representative example is what makes it concrete. Budget time for the second step; without it, new engineers approximate the practice instead of doing it.

Communicating the practice externally - to candidates, to other studios, to the broader industry - reinforces it internally. Teams that talk publicly about how they work tend to do that work better. The act of explaining clarifies the practice for the team, and the external audience holds the team accountable to the public version.

“Decisions without rationale are opinions. Decisions with rationale are accountable. The log is two minutes of writing that saves hours of future debate.”

Related Issues

For broader triage process, see bug triage meeting guide for game teams. For the priority rubric that should back up every decision, see how to build a bug priority rubric.

Read the decision log on any bug older than a month before you change its state. Past context usually changes what the current call should be.