Quick answer: A hotfix procedure documents the exact steps for shipping an emergency fix: when to trigger it, who approves it, how to build and test it quickly, how to deploy it, how to roll it back if it fails, and what to do afterward. Write the procedure before you need it. A team that follows a documented process under pressure makes fewer mistakes than a team that improvises.
A hotfix is the most stressful deployment you will ever do. The game is broken, players are angry, the clock is ticking, and every minute of downtime costs goodwill and revenue. Under that pressure, the last thing you want is ambiguity: “can I skip the code review?” “do I deploy to staging first or go straight to production?” “who needs to approve this?” An effective hotfix procedure answers every one of these questions in advance, so the engineer on call can execute the fix instead of debating process.
Trigger Criteria: When Is It a Hotfix?
Not every bug deserves a hotfix. Hotfixes bypass your normal quality gates — reduced testing, expedited review, off-hours deployment — and that bypass carries risk. A hotfix that introduces a new bug is worse than the original problem because it damages player trust twice: once for the original issue and once for the botched fix. Define clear trigger criteria so the team agrees on what justifies the risk.
Effective trigger criteria are measurable. Crash-free session rate drops below 95% (or whatever your baseline is). A data-loss bug is confirmed — players are losing saves, inventory, or currency. A security vulnerability is discovered that could expose player data or enable exploits. A complete feature outage: login is broken, matchmaking is down, the store is non-functional. An economy exploit is being actively used to generate unlimited currency or duplicate items.
Anything that does not meet these criteria goes into the next regular patch. A cosmetic glitch, a typo in dialogue, a minor balance issue, a UI element slightly misaligned — these are real bugs that deserve real fixes, but they do not justify the risk and cost of a hotfix deployment. Document the boundary explicitly so that the decision is objective, not emotional.
The Approval Chain
Every hotfix needs approval, but the approval process must be faster than the normal release process. Define the minimum approval requirements: who can approve, how many approvals are needed, and what communication channel to use. For a small indie team, one engineer reviewing the diff in a Slack thread might be sufficient. For a larger team, require sign-off from the on-call engineer and the team lead.
The approver’s job is not to do a full code review. It is to verify three things: the fix addresses the reported issue, the fix is minimal (no refactoring, no feature additions, no “while I am here” changes), and the fix does not obviously introduce new problems. A hotfix diff should be small enough to review in five minutes. If the diff is large, the fix is too ambitious for a hotfix — simplify it or find a mitigation (disable the broken feature, revert the commit that introduced the bug) that is smaller.
Fast-Track CI: The Minimum Viable Test Suite
Your normal CI pipeline might take 30 minutes or more: full test suite, multi-platform builds, performance benchmarks, lint checks. A hotfix cannot wait that long. Define a fast-track CI suite that runs only the essential checks and completes in under 10 minutes.
# .github/workflows/hotfix.yml
name: Hotfix Fast Track
on:
push:
branches: ['hotfix/**']
jobs:
fast-track:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Build
run: make build-release
- name: Smoke tests only
run: make test-smoke
- name: Critical path tests
run: make test-critical
# Skip: full suite, perf tests, secondary platforms
# These run post-deploy to verify, not block
The fast-track suite should include: a build check (does it compile?), smoke tests (does it boot?), and the critical path tests (does the main gameplay loop work?). Optionally include the specific regression test for the bug being fixed, if one exists. Everything else — full test suite, secondary platform builds, performance benchmarks — runs after deployment as a post-deploy verification, not as a deployment blocker.
Document which tests are in the fast-track suite and why. When someone asks “can we skip even the smoke tests?” the answer is in the document: no, because these are the minimum checks that prevent a hotfix from being worse than the bug it fixes.
Deployment Steps and Monitoring
The deployment section of your hotfix procedure should be a numbered list of exact commands with expected outputs. Do not describe what to do abstractly (“deploy to staging”); write the command (“run make deploy-staging and wait for the output Deployment complete”). During a crisis, engineers are stressed and sleep-deprived. Exact commands eliminate the possibility of typos, wrong targets, or missed steps.
After deploying to production, monitor for at least 15 minutes before declaring the hotfix successful. Watch the crash-free session rate, error log volume, and any metrics related to the fixed issue. If the crash rate increases or new errors appear, execute the rollback plan immediately. Do not wait to see if it stabilizes. A hotfix that makes things worse must be reverted within minutes, not hours.
Define what “success” looks like in measurable terms. The crash rate returns to baseline within 15 minutes. The error log for the fixed issue drops to zero. The feature that was down is functional again. These are your exit criteria for the hotfix deployment. If they are met, the hotfix is successful. If they are not met within the defined timeframe, escalate or rollback.
The Rollback Plan
Every hotfix must have a rollback plan written before the hotfix is deployed. The rollback plan is a set of steps that revert the game to the previous version. For server-side games, this means redeploying the previous build. For client-side games, this may mean re-enabling a feature flag, reverting a configuration change, or in the worst case, pushing a second hotfix that reverts the first.
Test the rollback plan before you need it. Deploy to staging, apply the hotfix, then execute the rollback and verify that the previous version is running correctly. An untested rollback plan is a bet that the revert process works. That bet fails often enough that testing it is not optional.
“The goal of a hotfix is not to ship perfect code. It is to stop the bleeding. Perfection comes in the follow-up patch. The hotfix just needs to be better than the current state and not worse than the previous state.”
Post-Hotfix Review
Within 48 hours of a hotfix, conduct a post-hotfix review. This is a short meeting or document that answers four questions. What went wrong? Why was it not caught before release? What was the fix? What process changes would prevent recurrence? The review is blameless — the goal is to improve the system, not to punish the person who introduced the bug.
Common process improvements from post-hotfix reviews include: adding a regression test for the fixed bug, adding the failure scenario to the pre-release checklist, improving monitoring to detect the class of issue faster, and extending the fast-track CI suite with a new test category. Each review should produce at least one concrete action item that gets tracked to completion. A review without action items is a discussion, not an improvement.
Finally, merge the hotfix branch back into the main development branch immediately after the review. A hotfix that lives only on the production branch will be overwritten by the next regular release unless it is merged back. This step is easy to forget in the relief of resolving the incident. Put it in the procedure.
Related Issues
For writing the player-facing communication that accompanies a hotfix, see how to write hotfix patch notes that build player trust. For building a broader incident response process that includes hotfixes, read how to build a live ops runbook.
A hotfix procedure that lives in someone’s head is not a procedure. Write it down, review it quarterly, and keep it one click away from the on-call engineer.