Quick answer: Enable core dumps with ulimit -c unlimited, ship with a symbol file uploaded to your crash tool, run the server under systemd with automatic restart, and stream structured logs to a central store. A crash that would take hours to diagnose with a bare binary becomes a ten-minute investigation with these four pieces in place.
Your game launches with a fleet of dedicated servers. Players connect, play a match, and report to you that the server died mid-match with everyone getting disconnected. You SSH into the host and find nothing useful — the process is gone, there is no log file, and systemd has already restarted it. The only evidence the crash happened is a metric spike and a wave of angry tweets. Debugging dedicated server crashes is a different beast from debugging client crashes because the environment is less cooperative. Here is the setup that makes it tractable.
Step 1: Enable Core Dumps
A core dump is a snapshot of your process's memory at the moment it crashed. With a core dump and your debug symbols, you can reconstruct the full stack trace, inspect variables, and walk backward through the code path that led to the crash. Without a core dump, you have nothing.
On the Linux host running your dedicated server:
# Set per-process limit (in the server's systemd unit file)
LimitCORE=infinity
# Configure kernel core pattern (one-time host setup)
echo "/var/crashdumps/core-%e-%p-%t" | sudo tee /proc/sys/kernel/core_pattern
sudo mkdir -p /var/crashdumps
sudo chmod 1777 /var/crashdumps
The pattern core-%e-%p-%t produces filenames with the executable name, PID, and timestamp. The 1777 permissions let any user write to the directory with the sticky bit to prevent deletion races.
Verify the setup by sending a SIGSEGV to a test process and confirming the dump appears. If it does not, check /var/log/dmesg for the reason — common failures are a full disk, a disabled kernel.core_pattern, or a missing directory.
Step 2: Keep Debug Symbols Separate
Never ship your dedicated server with debug symbols embedded. A stripped binary is smaller, faster to deploy, and gives attackers less information if they obtain it. But you must keep the symbols somewhere — without them, a core dump is just addresses.
# Build: separate symbols into a .sym file
gcc -g -O2 -o gameserver gameserver.c
objcopy --only-keep-debug gameserver gameserver.sym
strip --strip-debug --strip-unneeded gameserver
objcopy --add-gnu-debuglink=gameserver.sym gameserver
# Upload the .sym file to your crash reporting service
curl -F "symfile=@gameserver.sym" \
-F "build=1.4.2-release" \
https://crashes.example.com/api/symbols
Keep the .sym file for every build you have ever deployed, indexed by build ID. When you get a crash from version 1.4.2-release two months later, you can still symbolicate it.
Step 3: Run Under a Supervisor
A dedicated server that crashes and stays dead is worse than one that crashes and restarts. Run the server under systemd:
# /etc/systemd/system/gameserver@.service
[Unit]
Description=Game Server Instance %i
After=network.target
[Service]
Type=simple
User=gameserver
Group=gameserver
ExecStart=/opt/gameserver/gameserver --instance=%i --port=270%i
Restart=always
RestartSec=10
LimitCORE=infinity
LimitNOFILE=65535
[Install]
WantedBy=multi-user.target
The @%i template syntax lets you start many instances from a single unit file: systemctl start gameserver@1, gameserver@2, etc. Each gets its own logs, its own port, and restarts independently.
RestartSec=10 is important — without a backoff, a server that crashes on startup will restart instantly in a tight loop, burning CPU and filling your logs.
Step 4: Emit Structured Logs
Text logs are useful when you are SSH'd into one host debugging one crash. Structured logs are essential when you have 100 hosts and need to query across all of them. Use JSON lines format:
{
"ts": "2026-04-09T14:23:01.123Z",
"level": "error",
"instance": "gs-42",
"match_id": "abc123",
"build": "1.4.2-release",
"tick": 38291,
"msg": "AI pathfinder returned nil path for target",
"player_count": 10,
"memory_mb": 823
}
Pipe these to a central log store (Loki, ClickHouse, or cloud-native equivalents) and index the important fields. When a crash happens, you can query for the last N lines from the crashing instance in milliseconds, which lets you see what the server was doing right before it died.
Step 5: Detect Hangs, Not Just Crashes
A crash is the easy case — the process exits and leaves a dump. A hang is worse: the process is still running, it is not serving players, and nothing looks wrong to the supervisor. Add a watchdog that the server pings every few seconds:
func (s *Server) heartbeatLoop() {
ticker := time.NewTicker(2 * time.Second)
for range ticker.C {
s.lastHeartbeat.Store(time.Now().UnixNano())
}
}
func (s *Server) watchdog() {
ticker := time.NewTicker(5 * time.Second)
for range ticker.C {
last := time.Unix(0, s.lastHeartbeat.Load())
if time.Since(last) > 15*time.Second {
slog.Error("main loop hung", "last_beat", last)
os.Exit(1) // trigger supervisor restart
}
}
}
The watchdog runs on its own goroutine (or thread) and exits the process if the main loop stops updating its heartbeat. Calling os.Exit(1) triggers a clean unwind to the supervisor, which restarts the server. Logging before exit captures evidence of the hang for later investigation.
Step 6: Correlate Crashes With Player Actions
Once you have dumps, logs, and restarts in place, the next step is to correlate crashes with what was happening in the game. Every crash should come with answers to:
- What was the match mode and map?
- How many players were connected?
- What tick were we on?
- What was the last input processed?
- Who owned the last object touched before the crash?
Write these into a "last known state" file on disk every tick (or every N ticks). On crash, the file is your first clue. A surprising number of dedicated server crashes turn out to be "when player count exceeds 12 in map X" or "when a specific weapon is fired in a specific direction" once you have the state file.
Step 7: Build a Staging Fleet
Production crashes are hard to reproduce because you cannot attach a debugger to a live match. Run a parallel staging fleet with the same build, a subset of real traffic (via a feature flag or canary routing), and no debug stripping. When a crash happens on staging, you get the full picture with symbols, an unstripped core dump, and the freedom to attach GDB without kicking players off their game.
"Dedicated server crash debugging is 10% programming and 90% having the right infrastructure set up before the crash happens. Do the infrastructure work on day one, not after the first outage."
Related Issues
For packet loss issues that can cause server-side state corruption see how to debug network packet loss. For lag compensation bugs that often show up alongside crashes, read how to debug multiplayer lag compensation bugs.
Core dumps. Symbols. Supervisor. Structured logs. The four horsemen of live server sanity.