Quick answer: Use Unity.Mathematics types (float3, float4). Replace branches with math.select. Inspect Burst output for SIMD register names. Split by archetype so memory access stays linear.
IJobChunk runs in Burst but only 1.5x faster than C#. Burst Inspector shows scalar code where you expected SIMD. Branches and AoS layout sabotaged vectorization.
The Symptom
Burst-compiled job runs but performance is mediocre. Burst Inspector shows xmm0[0] (scalar) instead of xmm0 (vector). Hot loop has if/else patterns.
The Fix
using Unity.Mathematics;
[BurstCompile]
struct MoveJob : IJobChunk
{
public ComponentTypeHandle<LocalTransform> transformH;
[ReadOnly] public ComponentTypeHandle<Velocity> velocityH;
public float dt;
public void Execute(in ArchetypeChunk chunk, int idx, bool useEnabled, in v128 mask)
{
var transforms = chunk.GetNativeArray(ref transformH);
var velocities = chunk.GetNativeArray(ref velocityH);
for (int i = 0; i < chunk.Count; i++)
{
var t = transforms[i];
var v = velocities[i].value;
// Branchless: clamp speed
var speed = math.length(v);
var clamped = math.select(v, v * (10f / speed), speed > 10f);
t.Position += clamped * dt;
transforms[i] = t;
}
}
}
math.select replaces if/else. Burst can vectorize the entire loop body when there are no control-flow divergences.
Inspect
Window → Burst → Burst Inspector. Find your job. Show LLVM IR or assembly. Look for SIMD register names. Compare before/after the rewrite.
Verifying
Profile cycles per element. SIMD on float3 should give roughly 4x scalar. Use Burst’s timing column or Profiler markers.
“Mathematics types. Branchless. Linear memory. Burst vectorizes.”
Related Issues
For Burst BC1006, see BC1006. For Job data race, see data race.
math.select. float3. Wide registers.