Quick answer: A crash dashboard you check manually is a monitoring tool. An alert that fires when something goes wrong is what actually protects a launch. You need both — but alerting is the one that works at 2am when your game just went live in a new region and no one is watching the dashboard.
Most developers who set up crash reporting think the job is done once the SDK is integrated and crashes are flowing into a dashboard. That is the starting point, not the finish line. A crash dashboard is passive — it shows you what is happening if you are already looking. For a game release, you need an active system that tells you when something is wrong, fast enough that you can respond before thousands of players hit the same wall. This post covers how to build that alerting layer using Bugnet, what thresholds to set, and how to avoid the common mistake of configuring so many alerts that the team learns to ignore them.
Monitoring vs. Alerting: Why the Distinction Matters
Monitoring gives you visibility. You open your Bugnet dashboard, you see the crash-free rate over time, the top crash groups by frequency, and the trend graph since your last release. This is valuable information when you are actively investigating a problem.
The limitation is that monitoring requires someone to be looking. In the days after a game release, your team is stretched thin. You are handling launch-day support messages, responding to reviews, preparing hotfix candidates, and probably also trying to sleep occasionally. Checking the crash dashboard every 30 minutes is not realistic, and a problem that appears at 3am while everyone is asleep can affect thousands of players before it is noticed.
Alerting is monitoring with a trigger. Instead of waiting for someone to check the dashboard, the system checks continuously and sends a notification when a metric crosses a threshold that signals a problem. Alerting does not require anyone to be watching. It finds you.
For a small indie team, this distinction is especially important because you do not have a dedicated operations engineer whose job is to watch dashboards. Alerting compensates for the bandwidth limits of a small team and ensures that genuinely serious problems get immediate attention regardless of what time they occur.
The Three Alert Thresholds Every Release Needs
You do not need a complex alerting setup. For most indie game releases, three well-configured thresholds cover the scenarios that actually matter:
1. New crash group detected. Fire an alert the first time a crash signature appears that has never been seen before. This is the most direct signal that your latest build introduced a regression. If you ship an update and immediately see three new crash groups that did not exist before, the update broke something and you need to investigate before more players hit it. In Bugnet, you can configure this alert per build version, so you only get notified about crash groups that are genuinely new — not pre-existing groups that have been occurring for weeks.
2. Crash volume spike above baseline. Fire an alert when the number of crashes in a rolling window (typically 15 or 60 minutes) exceeds a baseline by a set percentage. The baseline is calculated from your normal crash rate during stable periods. A 200% spike above baseline is a meaningful signal that something has gone wrong. A 20% fluctuation is probably noise. Set your spike threshold conservatively at first and tune it down after you have a few weeks of data to understand your game’s normal variance.
3. Crash-free rate drops below threshold. This is the clearest single number for overall game health. The crash-free rate is the percentage of sessions that end without a crash. A stable, well-running game typically has a crash-free rate above 99%. If yours drops below 97%, something is wrong and players are noticing. If it drops below 95%, you have a major problem. Set an alert threshold that reflects your game’s normal baseline and fire it when the rate drops meaningfully below that. Bugnet’s game health dashboard tracks this metric continuously and can trigger webhook notifications when it crosses your configured threshold.
Routing Alerts to Discord and Slack via Bugnet
An alert that fires into a system no one checks is no better than no alert at all. Route your crash alerts to wherever your team is already paying attention. For most indie developers, that means Discord, Slack, or both.
Bugnet supports outbound webhooks for alert notifications. The setup takes about five minutes:
- In Discord, go to your server settings, find the Integrations section, and create a new webhook for the channel you want alerts to post to. Name it something descriptive like “Bugnet Crash Alerts” and copy the webhook URL.
- In your Bugnet project settings, navigate to the Integrations tab and add the Discord webhook URL.
- Select which alert types should trigger Discord notifications. At minimum, enable new crash group and crash-free rate drop alerts.
- Configure the notification format to include the alert type, crash group name if applicable, current count, affected build version, and a direct link to the crash in Bugnet.
For Slack, the process is identical: create an incoming webhook in your Slack workspace settings and add the URL to Bugnet. If you use both Discord (for your community server) and Slack (for your internal team), route different alert types to each. New crash group alerts are appropriate for both. High-volume crash spikes and crash-free rate drops are better sent to your internal team channel where you can respond without community visibility.
Spike Detection vs. Sustained Rate Alerts
There are two different patterns that crash alerts should distinguish: sudden spikes and sustained elevated rates. They indicate different problems and require different responses.
A spike is a rapid increase in crash volume that begins at a specific point in time. Spikes typically indicate a single cause: a bad update, a server-side change that affects your game, a platform update that broke compatibility, or a sudden influx of players from a sale or feature who all hit the same device-specific crash. Spikes often resolve naturally as the triggering condition stabilizes, but they still require investigation to understand the cause and determine whether a fix is needed.
Configure spike alerts to fire when crash volume increases by more than 150% above the rolling 24-hour average within any 15-minute window. This is sensitive enough to catch real problems but tolerant enough to ignore the natural noise of player traffic fluctuating throughout the day.
A sustained rate alert fires when crash volume is elevated over a longer period — not a sudden spike but a persistent elevation that does not return to baseline. This often indicates a crash that only affects a subset of hardware configurations, so it never causes a dramatic spike but quietly affects a meaningful portion of your players for days. Configure a sustained rate alert to fire when the 4-hour crash rate exceeds your baseline by 50% or more, and configure it with a delay so it only fires if the elevation persists for at least 30 minutes rather than a momentary fluctuation.
Avoiding Alert Fatigue
Alert fatigue is what happens when you receive too many alerts that turn out to be unimportant. The team reads the first few, investigates, finds nothing serious, and gradually starts ignoring the notifications. Then a real problem fires the same alert and no one responds because the signal has been trained out of them.
This is not a hypothetical. It is one of the most common failure modes in operational monitoring, and it affects indie teams just as much as large studios. The solution is to configure fewer, higher-quality alerts rather than comprehensive coverage of every possible metric.
A few specific practices that help:
- Set minimum occurrence thresholds. A new crash group that has occurred once is not necessarily worth an alert. Set a minimum of 3 to 5 occurrences before a new group triggers a notification, to filter out one-off environmental issues.
- Suppress alerts during known maintenance. If you are deploying an update, suppress crash alerts for 15 minutes after deployment. The act of deploying often temporarily elevates crash rates as players on the boundary of old and new versions create edge case sessions.
- Group related alerts. If three alert conditions fire within a two-minute window, send a single notification that summarizes all three rather than three separate messages. Bugnet’s alert grouping helps prevent a single incident from generating a flood of notifications.
- Review and tune thresholds monthly. As your game’s player base and crash baseline evolve, alert thresholds that were well-calibrated at launch may become too sensitive or too lenient. Schedule a brief review of alert performance after each major update.
The First Two Hours After Launch
The two hours immediately following a game launch or major update are the highest-value alerting period. Player volume is at its peak, new hardware and software configurations are hitting your game for the first time, and any issues that affect more than a small fraction of players will become visible within this window.
During this period, it is worth temporarily lowering your alert thresholds to be more sensitive. A new crash group that appears in the first two hours of a launch should fire an alert after 2 to 3 occurrences rather than your normal threshold of 5. A crash-free rate drop of 2% warrants immediate attention at launch even if the same drop during a quiet week would be within acceptable range.
Designate one person on the team as the on-call developer for the first two hours after every release. Their job is to watch the Bugnet dashboard actively, respond to any alerts immediately, and make the call on whether an issue requires an emergency hotfix or can wait for investigation. After the two-hour window, revert to your normal alert thresholds and standard response process.
“The first two hours of a launch are when alerts earn their keep. Everything after that is important, but the window when you can catch a serious crash before it affects most of your players is very short.”
Building a Runbook for Each Alert Type
A runbook is a short document that tells the on-call developer exactly what to do when a specific alert fires. Without a runbook, every alert requires the developer to figure out the investigation process from scratch, which takes time and leads to inconsistent responses.
For each of your three core alert types, write a brief runbook entry that covers:
- What the alert means and what caused it in the past
- The first three things to check in the Bugnet dashboard
- The criteria for escalating to a hotfix decision vs. logging it for the next regular update
- Who to notify if the issue is confirmed serious
- Where to post a public update if players are affected
Keep the runbook in a shared document your team can access quickly — not buried in a private notes app. During a launch-day incident, the developer responding to the alert may not be the one who set up the alerting system, and the runbook needs to be findable in under 60 seconds.
The investment in writing runbooks is small. The payoff during an actual incident is significant: a developer who knows exactly what to check and exactly what threshold triggers a hotfix decision can respond in 10 minutes instead of 45.
Set up your alerts before launch day, not during it. The five minutes it takes to configure a Discord webhook and three thresholds in Bugnet is the most cost-effective thing you can do to protect a release.