Quick answer: Detect fast with monitoring, acknowledge to players quickly, mitigate to restore service before perfecting a fix, communicate on a regular cadence, and confirm resolution. Players forgive outages handled well.
An outage breaks the game for players at once, so how you handle it matters as much as the outage itself. Here are the best practices for handling an outage.
Detect Fast and Acknowledge to Players Quickly
The faster you detect and acknowledge, the less anger builds, so detect fast with monitoring and acknowledge to players quickly, even before you have answers. A fast we're aware and investigating beats a perfect explanation an hour later, since silence breeds anger.
Bugnet alerts on crash and error spikes, so you detect an outage fast enough to acknowledge quickly. Fast detection and quick acknowledgment defuse most of the anger an outage causes, by telling players they're not being ignored.
Mitigate to Restore Service Before Perfecting a Fix
The priority during an outage is restoring service, so mitigate first, roll back a bad deploy, restart, fail over, whatever restores the game, before perfecting the root-cause fix. Restoring service stops the player impact, after which you can fix properly without pressure.
Bugnet's per-version and error tracking helps you find what to mitigate (e.g. a bad deploy). Mitigating to restore service before perfecting a fix is the core outage discipline, since stopping the player impact matters more than the elegant solution.
Communicate on a Cadence and Confirm Resolution
Communicate on a regular cadence during the outage (even no change yet, still working) so players aren't in the dark, and confirm resolution clearly when it's fixed. Steady communication keeps players patient, and a clear all-clear ends the incident on competence.
Bugnet's monitoring confirms when the outage is genuinely resolved before you announce the all-clear. So practice handling an outage by detecting fast and acknowledging quickly, mitigating to restore service, and communicating on a cadence while confirming resolution, since players forgive outages handled with fast, honest handling.
Detect fast with monitoring, acknowledge to players quickly, mitigate to restore service before perfecting a fix, communicate on a regular cadence, and confirm resolution. Players forgive outages handled well.