Quick answer: Use Unity.Mathematics types (float3, float4). Replace branches with math.select. Inspect Burst output for SIMD register names. Split by archetype so memory access stays linear.

IJobChunk runs in Burst but only 1.5x faster than C#. Burst Inspector shows scalar code where you expected SIMD. Branches and AoS layout sabotaged vectorization.

The Symptom

Burst-compiled job runs but performance is mediocre. Burst Inspector shows xmm0[0] (scalar) instead of xmm0 (vector). Hot loop has if/else patterns.

The Fix

using Unity.Mathematics;

[BurstCompile]
struct MoveJob : IJobChunk
{
    public ComponentTypeHandle<LocalTransform> transformH;
    [ReadOnly] public ComponentTypeHandle<Velocity> velocityH;
    public float dt;

    public void Execute(in ArchetypeChunk chunk, int idx, bool useEnabled, in v128 mask)
    {
        var transforms = chunk.GetNativeArray(ref transformH);
        var velocities = chunk.GetNativeArray(ref velocityH);

        for (int i = 0; i < chunk.Count; i++)
        {
            var t = transforms[i];
            var v = velocities[i].value;

            // Branchless: clamp speed
            var speed = math.length(v);
            var clamped = math.select(v, v * (10f / speed), speed > 10f);

            t.Position += clamped * dt;
            transforms[i] = t;
        }
    }
}

math.select replaces if/else. Burst can vectorize the entire loop body when there are no control-flow divergences.

Inspect

Window → Burst → Burst Inspector. Find your job. Show LLVM IR or assembly. Look for SIMD register names. Compare before/after the rewrite.

Verifying

Profile cycles per element. SIMD on float3 should give roughly 4x scalar. Use Burst’s timing column or Profiler markers.

“Mathematics types. Branchless. Linear memory. Burst vectorizes.”

Related Issues

For Burst BC1006, see BC1006. For Job data race, see data race.

math.select. float3. Wide registers.