Quick answer: Confirm the outage and its scope from monitoring and client-side errors, restore service by addressing the cause (overload, bad deploy, dependency failure), then review the captured errors and per-version data to prevent a recurrence.
When your game server goes down, players can't connect or play, and every minute costs you. Fast detection, fast restoration, and learning from the impact are what limit the damage. Here is what to do when your game server goes down.
Confirm the Outage and Its Scope
First, confirm it's really an outage and understand its scope: are all players affected or some, is it total (server unreachable) or partial (errors, timeouts)? Check your monitoring and the client-side errors players are hitting, the connection failures and server errors captured from the field tell you the outage's scope and symptoms.
Bugnet captures client-side errors from the field, so when your server goes down you can see the connection failures and server errors players are hitting, confirming the outage and its scope. That player-side view complements server monitoring, you see the outage as players experience it (the errors, the scope), which helps you gauge impact and urgency.
Restore Service by Addressing the Cause
Get players back online: identify why the server went down, overload (too much traffic), a bad deploy (a server update that broke it), or a dependency failure (a database or service it relies on), and address it. Restore from a bad deploy by rolling back, from overload by scaling, from a dependency by recovering it.
Bugnet tracks per version, so if the outage followed a server-side release you can see it in the timing, pointing at a bad deploy to roll back. Knowing whether the outage coincided with a deploy (versus traffic or a dependency) tells you the likely cause and the fastest restoration path, the difference between rolling back and scaling.
Review the Impact and Prevent a Recurrence
After restoring, learn from it: review the captured client-side errors and per-version data to understand what happened and how widely it affected players, then put in monitoring and safeguards (alerts, capacity, deploy gating) so the next outage is caught faster or prevented. An outage is a lesson in what to monitor.
Bugnet captures the client-side errors and tracks per version, so after an outage you can review its full impact and whether a deploy caused it, and set up alerting on the client-side error spikes that signal an outage. This turns an outage into prevention, you learn what an outage looks like in your data and monitor for it, catching the next one faster.
When your game server goes down, confirm the outage and scope from monitoring and client-side errors, restore by addressing the cause (deploy, overload, dependency), then review the impact and add monitoring. Fast detection and restoration limit the damage.