Quick answer: Halt the rollout immediately so it doesn't reach more players, use the per-version data from the rollout group to diagnose the issue, roll back the rolled-out portion if needed, fix the issue, and resume once verified.
A staged rollout going wrong, the new build showing problems on the small rollout group, is actually the system working: it caught the issue before full release. Halting and fixing is the right response. Here is what to do when your staged rollout goes wrong.
Halt the Rollout Immediately
The moment the rollout group shows problems, halt the rollout, stop it from expanding to more players. This is the whole point of staging: you've caught the issue on a small group, so stopping there contains it, sparing the rest of your players from the problem.
Bugnet's per-version monitoring with alerts catches the rollout group's problems fast, so you know to halt quickly. Seeing the new build's crash rate spike on the rollout group within minutes tells you to stop the rollout before it expands, the fast detection that lets staging contain the issue to the small group as intended.
Diagnose Using the Rollout Group's Data
Use the rollout group's data to diagnose: compare the new build against the previous one in per-version data to see what the rollout introduced, the new crashes and their context. The rollout group gave you real-world data on the problem at small scale, exactly what you need to find and fix it.
Bugnet captures the crashes and per-version comparison from the rollout group, so you can diagnose what the new build broke, the new crashes, their stack traces and conditions. The rollout group's captured data is the evidence to find the cause, staging didn't just catch the issue, it gave you the real-world diagnostic data to fix it.
Fix, Verify, and Resume the Rollout
Roll back the rolled-out portion if the issue is serious, fix the cause, verify the fixed build, then resume the staged rollout from a small percentage, monitoring per version to confirm it's now stable before expanding. The rollout proceeds once you've confirmed the fix.
Bugnet tracks per version, so when you resume the rollout with the fixed build you can confirm it's stable on the rollout group before expanding. This verifies the fix worked at small scale (the new crashes gone on the rollout group) before you roll out wider, so you expand a confirmed-good build rather than risking the problem again.
When your staged rollout goes wrong, halt it immediately to contain the issue to the small group, diagnose using the rollout group's data, roll back if needed, fix the issue, and resume from a small percentage once verified. A staged rollout catching a problem is the system working as intended.