Why is an unfair-match complaint not enough to debug skill-based matchmaking?

Because the complaint is about the outcome, while the bug, if any, is in the decision the matchmaker made from numbers the player never sees. A stomp could be a correct decision on a bad rating, a bad decision on good ratings, or simple variance. You can only tell which by capturing the rating inputs and the match-quality decision and checking whether the decision was justified.

What rating inputs should I capture for an SBMM bug?

Each participant's rating and uncertainty, the reporting player's recent rating history, and any modifiers that adjusted the inputs, such as decay, placement boosts, soft resets, or party-average adjustments. Uncertainty and modifiers are common bug sources, so capturing the raw rating plus everything that touched it lets you verify the number was correct before you question what the matchmaker did with it.

How do I tell a smurf from a real matchmaking bug?

Capture the inputs. A lopsided match where one player has a low rating, high uncertainty, and a sharply rising rating history is a smurf the system is correctly catching up, fixable only in the rating model. One where all ratings are well established and the match-quality score still came out high is a genuine decision bug. The numbers draw the line you cannot draw from the outcome alone.

Bug Reporting for Games With Skill-Based Matchmaking

Quick answer: Skill-based matchmaking turns rating inputs into a match decision, so its bugs are about the inputs and the decision, not the queue mechanics. Capture each player's rating, uncertainty, recent rating changes, and any modifiers, plus the match the matchmaker produced and the quality score it judged acceptable. Unfair-match complaints are only debuggable when you can see the numbers the matchmaker reasoned over and the decision it reached.

Skill-based matchmaking promises fair games by rating every player and assembling matches that should be close, and when it fails, players feel cheated in a way that ordinary bugs never produce. They get stomped by an obviously stronger opponent, or carry a team that should never have been on their side, and they blame the system, often loudly. The trouble is that fairness is a judgment your matchmaker made from numbers the player never sees, their rating, the opponents' ratings, the uncertainty around each. Debugging an unfair match means reconstructing that judgment. This post is about capturing the rating inputs and the matchmaking decision so an unfair-match complaint becomes a number you can check rather than a feeling you cannot.

Fairness is a decision made from numbers

Skill-based matchmaking is fundamentally a decision system. It takes each player's rating, usually some form of MMR with an uncertainty estimate, and it searches for a set of players whose combined ratings produce a match it predicts will be close, scored by some match-quality function. When a player complains the match was unfair, they are disputing that prediction, but they have no access to the inputs. From their seat, a stomp looks like a broken system, when it might have been a correct decision on bad inputs, a bad decision on good inputs, or genuinely just variance.

You cannot tell which of those it was from the complaint alone, because the complaint is about the outcome and the bug, if there is one, is in the decision. The only way to debug skill-based matchmaking is to capture the numbers the matchmaker reasoned over and the decision it produced, then evaluate whether the decision was justified by the inputs. That shifts the conversation from a player's subjective sense of unfairness to an auditable question, given these ratings, was this match a reasonable thing for the system to create. Capturing the decision is the whole game here.

Capture the rating inputs

The core inputs are each participant's rating and the uncertainty or confidence the system holds in it. Capture the reporting player's rating, their uncertainty, their recent rating history, and, where you can, the ratings of the other participants or at least the team averages and the spread. Uncertainty matters enormously, because a new account or one returning from a long break has a wide confidence band, and the matchmaker may correctly place it in a wide skill range that looks unfair but reflects genuine uncertainty about that player's true level.

Also capture any modifiers that adjusted the inputs, rating decay from inactivity, placement-match boosts, soft resets after a season, or party-average adjustments when players queue as a group. These modifiers are a rich source of bugs, a decay that overshoots, a placement system that ranks too aggressively, or a party adjustment that miscomputes the effective rating, all produce matches that feel wrong. Capturing the raw rating, the uncertainty, and every modifier that touched it lets you see whether the number the matchmaker used was even correct before you ever question what it did with that number.

Capture the matchmaking decision and quality

Inputs are only half the story, you also need the decision. Capture the match the matchmaker actually formed, the team compositions by rating, and the match-quality score or predicted win probability the system computed and deemed acceptable. This is the single most revealing field, because it tells you what the matchmaker thought of the match it made. If it scored a wildly lopsided match as high quality, the bug is in your quality function or your inputs, and the score makes that immediately obvious.

Capture the threshold the matchmaker was using too, the minimum quality it would accept, and whether that threshold had been relaxed because of a long wait. Many unfair matches are the result of the system widening its acceptable range to avoid leaving someone in queue, a deliberate tradeoff that is correct in design but can produce individual matches that feel unjust. Capturing both the score and the threshold lets you tell a genuine quality-function bug from an intentional wait-versus-fairness tradeoff that simply landed badly for this player. The decision and the standard it was held to are what make the complaint auditable.

Smurfs, decay, and the limits of fairness

Some unfair matches are not bugs in the matchmaker at all but artifacts of the rating model meeting reality. A smurf, a strong player on a fresh low-rated account, will stomp lobbies until their rating catches up, and the matchmaker is behaving correctly given a rating that is simply wrong about that player. Rating decay and long absences create the same gap. These are real fairness failures from the player's perspective, but the fix is in the rating model and detection systems, not the matching logic, and confusing the two wastes effort.

Capturing the inputs lets you classify these correctly. A lopsided match where one participant has a low rating, high uncertainty, and a sharply rising rating history is almost certainly a smurf catching up, not a matchmaking bug. One where every rating is well established and the quality score still came out high is a real defect in the decision. Without the inputs you cannot tell these apart, and you will either chase phantom matchmaker bugs or dismiss real ones as smurfs. The numbers are what let you draw the line between a fairness limit you cannot fix in matching and a decision bug you can.

Setting it up with Bugnet

With Bugnet, the in-game report button captures your custom fields automatically, so an unfair-match report arrives carrying the player's rating and uncertainty, their recent rating history, any active modifiers, the team compositions by rating, the match-quality score, and the threshold in effect, all without the player describing a thing they could not see anyway. They just say the match was unfair, and the report already holds the matchmaker's reasoning. If a match handoff triggers a crash, Bugnet captures the stack trace alongside that same decision context.

Because Bugnet folds duplicate reports into one grouped issue with an occurrence count, a systemic rating bug, say decay overshooting after a season reset, surfaces as a spike in unfair-match reports on one issue rather than as scattered salt that is easy to dismiss. Filter the dashboard by your rating and modifier custom fields to isolate whether complaints cluster around fresh accounts, post-reset players, or party queues, and sort by occurrence to prioritize. Player attributes let you correlate unfair-match reports with rating bands, turning a flood of subjective fairness complaints into a clear signal about where your decision system or rating model is actually wrong.

Building trust in the rating system

Players forgive a lot, but they do not forgive a competitive system they believe is rigged against them, so perceived fairness is a retention issue as much as a technical one. Make the rating inputs and the matchmaking decision a standard part of every unfair-match report, so that when a player complains, you can answer with the actual decision rather than a defensive shrug. Even when the decision was correct and the loss was variance, being able to see that is what lets you respond with confidence instead of doubt.

Watch occurrence counts on unfair-match issues alongside your rating-distribution metrics, because a rising count concentrated in one rating band or after one model change is a strong signal that the decision system or the rating math drifted. With the inputs and the decision captured on every report, you can separate variance from smurfs from genuine matchmaker bugs, fix the real defects, and tune the wait-versus-fairness tradeoff with evidence. A skill-based system players trust is one whose decisions you can reconstruct and defend, and that begins with capturing the numbers behind every match it makes.

An unfair match is a decision made from numbers the player never saw. Capture the rating inputs and the quality score, and fairness becomes auditable.