Quick answer: Confirm the outage's scope from monitoring and the errors players hit, restore service by addressing the cause, and communicate with players, fast detection, restoration, and communication limit the damage.
An outage locks players out, and how you respond determines the damage. Here are the best ways to respond to an outage.
Confirm the Outage and Its Scope
Respond to an outage by first confirming it and its scope, how many players are affected, what is broken, when it started, from your monitoring and the errors players are hitting. The error spike and its scope tell you the severity.
Bugnet captures the errors players hit during an outage, so you can see the spike and its scope from the player side, confirming the outage and its severity.
Restore Service by Addressing the Cause
Respond to an outage by restoring service, identify the cause (a bad deploy, overload, or dependency failure) and reverse it (roll back, scale, or recover), with getting players back online as the priority. Restore first, fix thoroughly after.
Bugnet tracks per version, so if the outage followed a deploy you can see it in the timing, pointing at the bad deploy to roll back, the fastest restoration path.
Communicate With Players
Respond to an outage by communicating with players, acknowledge the outage and that you are working on it, and update when resolved, since players forgive a communicated outage far more than a silent one. Communication preserves goodwill.
Bugnet's public tracker lets you show players you know about the outage and are working on it, then mark it resolved, communicating the responsiveness that preserves goodwill through an outage.
Respond to an outage by confirming its scope from monitoring and the errors players hit, restoring service by addressing the cause, and communicating with players. Fast detection, restoration, and communication limit the damage.