Why does my Unity Burst job not get faster with more cores?

Batch size too small produces high scheduling overhead per item; too large limits parallelism. Aim for batch sizes that take 30-100 microseconds. Also avoid writing to the same cache line from multiple threads (false sharing).

How do I tune batch size for IJobParallelFor?

Profile with the Unity Profiler. If many tiny job slices show, raise batch size. If a few jobs run long while others idle, lower batch size. Sweet spot is when worker threads stay busy without per-batch overhead dominating.

What is false sharing and how do I avoid it?

Two threads writing to nearby memory addresses (same 64-byte cache line) cause cache invalidation pingpong. Pad your per-thread data to 64-byte boundaries, or accumulate per-thread results in separate buffers and combine after.

Fix: Unity Burst Jobs No Speedup at Higher Thread Count

Quick answer: Tune batch size so each batch takes 30–100 µs. Avoid false sharing by padding per-thread data or writing per-thread to separate arrays. Profile to confirm worker threads are saturated.

Here is how to fix Unity IJobParallelFor that does not scale beyond 2–3 cores even on a 16-core CPU. Batch size and false sharing are the two most common culprits.

The Symptom

Job runs at 5 ms on 4 worker threads. Bumping JobsUtility.JobWorkerCount to 8 produces 4.8 ms; 16 threads gives 4.7 ms. Diminishing returns despite plenty of CPU available.

What Causes This

Batch size too small. Per-batch scheduling cost dominates work. Tiny batches mean threads spend more time waiting for work than doing it.

False sharing. Multiple threads writing to nearby memory invalidate each other’s cache lines.

Memory bandwidth bound. Job is moving more data than the memory subsystem can supply; more cores cannot help.

The Fix

Step 1: Tune batch size.

// Test different batch sizes
JobHandle h = job.Schedule(arr.Length, batchSize: 64);   // try 16, 64, 256, 1024

Aim for batch durations of 30–100 µs. Profile with Unity Profiler’s Job preview track.

Step 2: Avoid false sharing.

// BAD: per-thread counters in adjacent ints
NativeArray<int> counters = new NativeArray<int>(threadCount, Allocator.TempJob);

// BETTER: pad to cache line
[StructLayout(LayoutKind.Sequential, Size = 64)]
struct PaddedInt { public int Value; }
NativeArray<PaddedInt> counters = new NativeArray<PaddedInt>(threadCount, Allocator.TempJob);

Each PaddedInt now occupies its own cache line; no thread interferes with another’s writes.

Step 3: Profile job timeline. Window → Analysis → Profiler. Switch to Jobs view. Look for narrow stacked bars (good parallelism) vs wide single-thread bars (poor parallelism).

Step 4: Reduce shared writes. Use NativeQueue or NativeStream to accumulate per-thread results without contention.

Step 5: For memory-bound work, work on smaller chunks per access. Tile large data so each batch fits in L2 cache.

Understanding the issue

This bug class falls into a pattern that's worth understanding beyond the specific case. In Unity Engine, the underlying behavior is shaped by how the engine layers its abstractions - the public API you call, the runtime systems that respond, and the platform-specific implementations underneath. A bug at any layer can produce symptoms that look like they originate at a different layer. Triaging effectively means recognizing which layer the symptom belongs to, even when the gameplay code is what's visible.

The specific bug described above is the kind that surfaces during integration rather than unit testing. It depends on a combination of factors: the asset configuration, the runtime state, the platform's specific behavior. In isolation, each piece looks correct; in combination, the bug emerges. This is why thorough integration testing - playing the actual game in realistic conditions - catches things that automated tests miss.

Why this happens

This bug class disproportionately affects late-stage development. The work to surface it is interactive testing in realistic conditions, which only really happens after the gameplay is in place and assets are populated. Catching it early requires deliberate testing of conditions that look unimportant.

At the engine level, the behavior comes from a deliberate design decision in Unity. The engine team chose a particular trade-off - usually performance versus convenience, or generality versus specificity - and that trade-off has consequences when you push against it. Understanding the trade-off is what turns 'this bug is mysterious' into 'this bug is the expected consequence of this design'.

Verifying the fix

For shipping games, the safest verification is a staged rollout. Apply the fix to 1% of players for 24 hours; watch the affected metric; expand if green. Skipping the staged rollout means the verification is the entire player base, which is too high a stakes for most fixes.

Reproducibility is the prerequisite for verification. If you can't reliably reproduce the bug pre-fix, you can't reliably verify it post-fix. Spend time getting a clean reproduction before you write any fix code. The fix is fast once you understand the reproduction; the reproduction is the slow part.

Variations to watch for

There's almost always a less obvious case where the same problem applies. The reported case is the one a player hit; the related cases hide because they're rarer or affect fewer players. After fixing the reported case, search the codebase for the pattern - one fix often unlocks several.

Adjacent bugs often share a root cause. After fixing the case you've found, spend an hour searching the codebase for similar patterns. What's the same call with different arguments? The same data flow with a different entity type? The same lifecycle issue in a sibling system? Each match is a candidate for the same fix, or a related fix that prevents future bugs of the same class.

In production

In shipping builds, this issue may interact with other production-only behavior. Stripping, encryption, asset bundling, and platform-specific code paths can each modify the symptoms. When players report a related issue, capture build SHA, platform, and any feature flags - those three fields cover most of the production-only variations.

When triaging a similar issue in production, prioritize gathering data over hypothesizing causes. A player report describes a symptom; what you need is a build SHA, a session timestamp, and ideally a screen recording or session replay. With those, the bug becomes tractable. Without them, you're guessing at hypothetical reproductions that may not match what the player actually hit.

Performance considerations

If this issue manifests under high load (many actors, many particles, many network connections), profile the post-fix code path with realistic counts. The original cost was a bug; the new cost is real work, and real work has a budget.

Diagnostic approach

Before applying any fix, gather enough context to be confident you're addressing the actual cause and not a similar-looking symptom. The cheapest diagnostic step is reproducing the bug deterministically - if you can't get the same failure twice in a row, your fix attempts will be hard to evaluate. Lock down the reproduction first.

For Unity-specific diagnostics, the editor's profiler is the canonical starting point. Capture a representative frame with the symptom present; compare against a frame without the symptom; the diff often points directly at the cause. If the symptom is non-deterministic, capture multiple frames and look for the pattern - the cause is usually a state transition or a specific input value rather than a continuous effect.

Tooling and ecosystem

The tooling around this bug class matters as much as the fix itself. Good logging, accessible profilers, and clear error messages turn 30-minute investigations into 5-minute ones. If your project doesn't have visibility into this code path, the first fix should add the visibility - the second fix uses it.

Within Unity, the relevant diagnostic surfaces include the standard frame debugger, memory profiler, and engine-specific debug overlays. Each one shows a different facet of what's happening. The frame debugger reveals draw call ordering and state transitions; the memory profiler shows allocation patterns; the debug overlay reveals per-system state. Bugs that resist one tool usually surrender to another - the trick is knowing which tool to reach for first.

Edge cases and pitfalls

Platform-specific edge cases are worth enumerating explicitly. iOS handles backgrounding differently than Android; Windows handles focus changes differently than macOS. A fix that works on the development platform may not work on every target. Test on each shipping platform deliberately.

When writing a regression test for this fix, focus on the boundary conditions that surfaced the original bug. Tests that exercise the happy path catch obvious regressions; tests that exercise the boundary catch the subtler regressions that look like new bugs but are really the original returning. The latter are the tests that earn their keep over the long life of the project.

Team communication

When this bug class affects multiple teams (often the case for cross-system issues), early communication prevents duplicate work. The team that owns the symptom may not own the cause. A 15-minute conversation at the start of triage often saves hours of independent investigation.

If this fix touches a system several engineers work in, a short writeup in the team's engineering channel helps. Not a full design doc - a paragraph explaining what was wrong, what's fixed, and what to watch for. Future engineers encountering similar symptoms will search for the fix; making it findable is a small investment that pays back later.

“Right batch size + no false sharing + memory-aware. Then scaling kicks in.”

Related Issues

For Burst compile errors, see Burst Compile. For NativeArray dispose, see Handle Dispose.

30-100us batches. Padded per-thread state. Profile saturation. Cores actually help.