Why does the same floating-point code produce different results on different CPUs?

IEEE 754 guarantees identical results for basic operations (add, subtract, multiply, divide, square root) when the rounding mode and precision are the same. However, different CPUs may use different internal register widths (x87 uses 80-bit extended precision, SSE uses 64-bit), different compilers may reorder operations or fuse multiply-add into a single fmadd instruction, and different optimization levels may change which instructions are emitted. These differences produce results that are off by one or two ULPs (units in the last place), which compound over thousands of operations.

What compiler flags help with floating-point determinism?

In GCC and Clang, use -ffp-contract=off to prevent fused multiply-add, -fno-fast-math to disable unsafe optimizations, and -msse2 -mfpmath=sse to force SSE math instead of x87. In MSVC, use /fp:strict to enforce strict IEEE compliance. These flags reduce performance slightly but ensure that the compiler does not introduce non-deterministic instruction substitutions.

Is fixed-point arithmetic a practical alternative to floats in games?

Yes, for simulation and gameplay logic. Fixed-point arithmetic uses integer operations which are deterministic on every platform. A 32.32 or 16.16 fixed-point representation provides enough precision for positions, velocities, timers, and damage calculations. The main limitations are reduced range and the need for explicit overflow handling. Many successful multiplayer RTS and fighting games use fixed-point for their simulation layer while keeping floats for rendering and audio.

How to Debug Floating-Point Determinism Across Platforms

Quick answer: Floating-point non-determinism across platforms is caused by differences in register widths, compiler instruction selection, and rounding behavior. Fix it by disabling fast-math, preventing fused multiply-add, forcing SSE on x86, and flushing denormals to zero. For code paths that must be deterministic (multiplayer simulation, replays), replace floats with fixed-point integer arithmetic.

You write a physics simulation. It runs perfectly on your development machine. You build for a different platform and the same simulation with the same initial conditions produces a different result. Not a dramatically different result — the first divergence is in the fifteenth decimal place. But that fifteenth-decimal-place error compounds over thousands of frames until one player sees a projectile hit and the other sees it miss. This is the floating-point determinism problem, and it has ruined more multiplayer games than any network bug.

Why IEEE 754 Does Not Guarantee Determinism

The IEEE 754 standard specifies exact results for the five basic operations: addition, subtraction, multiplication, division, and square root. Given the same operands, rounding mode, and precision, these operations must produce the same result on every conforming implementation. This sounds like a guarantee of determinism, but it is not, because the standard says nothing about how those operations are composed into larger expressions.

Consider the expression a * b + c. A compiler may evaluate this as two separate operations (multiply, then add) or as a single fused multiply-add (FMA) instruction. The FMA instruction produces a more accurate result because it performs the multiply and add with a single rounding step instead of two. But “more accurate” means “different,” and “different” means non-deterministic if one platform uses FMA and the other does not.

The x87 floating-point unit on older x86 CPUs uses 80-bit extended precision internally, even when operating on 64-bit doubles. The result is rounded to 64 bits when stored to memory, but intermediate values in registers retain 80-bit precision. This means the same sequence of operations can produce different results depending on whether the compiler spills an intermediate value to memory (rounding to 64 bits) or keeps it in a register (retaining 80 bits). The ARM NEON unit, by contrast, uses strict 32-bit or 64-bit precision. Same source code, same compiler, different CPUs, different results.

Compiler Flags That Matter

The first line of defense is compiler flags that constrain the compiler’s freedom to rearrange floating-point operations. The most important flags vary by compiler, but the goal is the same on every toolchain: prevent instruction fusion, prevent precision widening, and enforce a consistent rounding mode.

# GCC / Clang flags for floating-point determinism
CFLAGS += -ffp-contract=off     # Prevent fused multiply-add
CFLAGS += -fno-fast-math        # Disable all unsafe FP optimizations
CFLAGS += -msse2 -mfpmath=sse   # Force SSE instead of x87 (x86 only)
CFLAGS += -fno-associative-math  # Prevent reordering of FP operations
CFLAGS += -fno-reciprocal-math   # Prevent x/y -> x * (1/y) substitution

# MSVC flags
CL_FLAGS += /fp:strict           # Strict IEEE 754 compliance
CL_FLAGS += /Qfma-              # Disable FMA generation

The -ffp-contract=off flag is the single most impactful change. It prevents the compiler from fusing a * b + c into an FMA instruction, which is the most common source of single-ULP differences between platforms. The performance cost is measurable but small — typically 2-5% in physics-heavy code — and the determinism gain is enormous.

Disabling fast-math is critical because fast-math enables a collection of optimizations that individually seem harmless but collectively destroy determinism. It allows the compiler to assume that NaN and infinity do not occur, to reorder additions (violating IEEE associativity rules), and to replace divisions with reciprocal multiplications. Any of these transformations can produce results that differ by one or more ULPs from the strict IEEE result.

Denormals: The Hidden Trap

Denormalized numbers (also called subnormals) are extremely small floating-point values near zero. They are valid IEEE 754 values, but processing them is dramatically slower on most CPUs — often 10-100x slower than normal floats. Some platforms flush denormals to zero by default for performance (notably some ARM implementations and some console SDKs), while others preserve them. This means the same calculation that produces a tiny nonzero result on your PC produces exactly zero on a console.

The fix is to flush denormals to zero on every platform at startup. On x86, set the DAZ (Denormals Are Zero) and FTZ (Flush To Zero) bits in the MXCSR register. On ARM, set the FZ bit in the FPCR register. In C/C++, most compilers provide intrinsics or pragmas for this. Ensure you set these flags on every thread that runs simulation code, not just the main thread.

Diagnosing Divergence: The Binary Search Approach

When you know that two platforms produce different results but you do not know which operation diverges first, use a binary search through your simulation. Log the state at the midpoint of the simulation. If the states match at the midpoint, the divergence is in the second half. If they differ, it is in the first half. Repeat until you narrow it down to a single frame, then a single function, then a single line.

For each candidate line, log the exact operands and result on both platforms. Compare them bit-for-bit using the hex representation of the float (not the decimal representation, which hides ULP differences). When you find the first operation that produces a different result, you have found the root cause. It will almost always be one of: an FMA fusion, an x87 precision widening, a denormal handling difference, or a transcendental function (sin, cos, atan2) that is not required to be correctly rounded by IEEE 754.

Fixed-Point Arithmetic: The Nuclear Option

If compiler flags are not enough — and for cross-platform games that span x86, ARM, and console-specific architectures, they often are not — the only guaranteed solution is to eliminate floats from your deterministic code paths entirely. Fixed-point arithmetic represents numbers as scaled integers: a 16.16 fixed-point number stores 16 bits of integer and 16 bits of fraction, giving you a range of roughly -32768 to 32767 with precision to 1/65536.

Integer addition, subtraction, and multiplication are deterministic on every platform and every compiler. There is no rounding mode, no register width variation, no FMA fusion. The result of a + b in integers is the same on x86, ARM, MIPS, and any future architecture. This is why every major deterministic multiplayer game — from Age of Empires to GGPO-based fighting games — uses fixed-point for simulation.

The cost of fixed-point is ergonomic, not computational. You need to be explicit about precision, handle overflow manually, and implement your own math functions (sqrt, sin, cos) using lookup tables or polynomial approximations. Division requires special care to avoid precision loss. But the payoff is absolute determinism across every platform you will ever ship on, with no compiler flags, no platform-specific workarounds, and no late-night debugging sessions chasing single-ULP differences.

“The question is not whether floating-point math is deterministic. It can be made deterministic on a single platform with enough compiler flags and discipline. The question is whether the engineering cost of maintaining that determinism across five platforms and three compilers is higher than the cost of switching to fixed-point. For most multiplayer games, it is.”

Testing Determinism in CI

Determinism is not something you verify once and forget. It must be tested on every commit, because any new code that touches the simulation can introduce a non-deterministic operation. Record a set of reference replays — input sequences with known final states — and replay them on every target platform in CI. Compare the final state checksums against the reference values. If they diverge, the commit introduced non-determinism and should not be merged.

Cross-compile your CI to run on at least two different architectures (x86 and ARM are the most common pairing). A test that passes on x86-only CI does not prove cross-platform determinism. You need at least two architectures in the matrix to catch the most common divergence sources. If you target consoles, add a console build to the matrix as well — console CPUs have their own floating-point quirks that differ from desktop CPUs.

Related Issues

For debugging desync issues in lockstep multiplayer caused by float divergence, see how to debug desync in deterministic lockstep games. For building the CI infrastructure to run cross-platform tests, read how to build automated smoke tests for game builds.

Floating-point determinism is not a math problem. It is an engineering discipline problem. The math is well-understood. The discipline is what takes work.