Quick answer: Schedule with batch 32–128. Too small = scheduling overhead dominates. Too large = poor distribution. Profile both ends to find the sweet spot for your workload.

10,000 items, IJobParallelFor with batch 1 = 10,000 work units queued. Scheduler spends more on queuing than on the work.

The Symptom

Parallel job is slower than the serial equivalent. Profiler shows minimal time per Execute and large scheduling overhead.

The Fix

var handle = job.Schedule(N, 64);   // batch 64
handle.Complete();

64 items per batch. ~156 batches for 10k items, well above worker count for distribution.

IJobParallelForBatch

For SIMD-friendly work:

[BurstCompile]
struct MyJob : IJobParallelForBatch
{
    [ReadOnly] public NativeArray<float> src;
    public NativeArray<float> dst;

    public void Execute(int startIndex, int count)
    {
        for (int i = startIndex; i < startIndex + count; i++)
            dst[i] = src[i] * 2f;
    }
}

job.ScheduleBatch(N, 128);

Inner loop in one Execute call — Burst can vectorize across the batch.

Verifying

Profile serial vs parallel. Parallel should be ~workerCount faster. Tune batch size up/down by 2x and re-profile until you find the optimum.

“Batch 32–128. Profile. Parallel beats serial.”

Related Issues

For IJobChunk Burst SIMD, see SIMD. For Job data race, see data race.

Batch sized right. Parallel pays off.