Can you unit test a shader?

Yes. Compile the shader, dispatch it over a small known input (a 4x4 texture or a handful of vertices), read back the result, and compare to an expected output. Pure functions inside the shader can also be extracted to compute shaders for easier testing in isolation.

What is golden image testing?

Render a known scene with the shader, save the image to disk, and on subsequent runs compare pixel-by-pixel against the saved version. Any mismatch above a tolerance threshold fails the test. It catches regressions humans would miss and works without making assertions about specific pixels.

How do I handle floating-point tolerance in shader tests?

Use a per-channel epsilon (often 1 or 2 out of 255) and a percentage-of-pixels tolerance (allow up to 0.1% of pixels to exceed epsilon). Different GPUs produce slightly different results even for the same shader, so strict equality will fail on CI.

How to Set Up Test Coverage for Shaders

Quick answer: Shaders are testable. Write a headless render harness that dispatches the shader over fixture inputs, reads back the result, and compares to expected output. Add golden image tests for whole-scene regressions. Run the same tests against every backend (Direct3D, Vulkan, Metal) to catch compiler divergence before users do.

Your new rim-light shader looks great on your RTX 4080. It ships. Within an hour, three players on Intel integrated GPUs report their characters are rendered as solid white boxes. You never tested the Metal backend. You never tested on a low-end Vulkan driver. The shader compiled fine on every platform — but it doesn’t produce the same image on every platform. That’s what shader tests catch.

Three Kinds of Shader Test

1. Unit tests. Pure functions inside the shader — a tone-mapping curve, a noise function, an SDF primitive — can be tested in isolation. Extract the function to a compute shader that runs over an input buffer and writes to an output buffer. Assert that each output equals the expected value within tolerance.

2. Integration tests. Render a fixture scene with the shader applied and read back the resulting image. Compare against a golden image on disk. This catches regressions in the way the shader interacts with the rest of the pipeline (uniforms, vertex data, blend modes) rather than just the shader math.

3. Cross-platform tests. Run the same tests on every target backend. The same HLSL compiled to Direct3D, Vulkan (via SPIR-V), and Metal can produce subtly different images because the intermediate representations and driver compilers all differ. CI must cover every backend you ship.

Writing a Headless Harness

You need a way to run a shader and read pixels without a window. On Unity, enable Application.isBatchMode and use a RenderTexture with ReadPixels. On Unreal, use the offscreen render path from FSceneRenderer. On your own engine, create a headless surface (Vulkan’s VK_KHR_surfaceless or an offscreen swapchain) and render to a FBO.

// Minimal compute shader unit test in C#
[Test]
public void TonemapCurve_MatchesReference() {
    var inputs = new float[] { 0.0f, 0.5f, 1.0f, 2.0f, 10.0f };
    var expected = new float[] { 0.0f, 0.377f, 0.627f, 0.839f, 0.973f };

    var outputs = DispatchCompute("Tonemap.compute", inputs);

    for (int i = 0; i < expected.Length; i++)
        Assert.That(outputs[i], Is.EqualTo(expected[i]).Within(0.001f));
}

Golden Image Testing

Golden images catch the bugs you can’t describe in asserts — a normal map sampled with the wrong UV, a lighting term dropped at compile time. The workflow:

Render a fixture scene (a sphere, a cube, a single lit character) with fixed camera and lighting.
Save the resulting image to tests/golden/tonemap_sphere.png. Commit it.
On subsequent test runs, render the scene again and compare pixel-by-pixel.
A mismatch above tolerance fails the test and writes both the actual and a diff image for inspection.

Tolerance matters. A strict equality check fails on every minor driver update. Use two thresholds: a per-channel epsilon (1–2 out of 255 for sRGB, 0.004 for linear) and a percentage-of-pixels-exceeding threshold (usually 0.1%). Anti-aliased edges account for most near-threshold differences and are not regressions.

Picking Fixture Inputs

Good fixtures are minimal and stable. For a lighting shader, use:

A flat plane at varying angles to the light (tests N · L).
A sphere (tests curvature and highlight falloff).
A scene with three primary-colored lights (tests color math).
An emissive material (tests the bypass path).

Avoid animated fixtures — any time-dependent input is a source of flakiness. Freeze time at a known value. Disable jitter. Disable temporal anti-aliasing unless you’re specifically testing it.

Cross-Platform Verification

Shader compilers differ. The same HLSL passed through the Direct3D compiler, then cross-compiled to SPIR-V via DXC, then translated to Metal via SPIRV-Cross can produce different binaries with different rounding behavior. Compile-time constant folding may reorder operations. A 1.0 / 0.0 that DXC treats as undefined may produce +Inf on Metal and -Inf on Vulkan.

Run your shader tests on CI runners with each backend. GitHub Actions now has macOS runners for Metal, Windows for Direct3D, and Linux for Vulkan. Use MoltenVK on macOS if you don’t want separate tests for Metal and Vulkan. Store a per-backend golden image if needed — some differences are real but acceptable.

Update Golden Images Safely

Sometimes you genuinely want to change the output. A new tone-mapping curve, a bug fix in the lighting model. Add a CLI flag to the harness: --update-goldens. Run the tests locally, verify the new images are correct (diff against the old ones, eyeball the result), commit the updated golden images with a clear changelog entry.

Require code review for every golden image change. Reviewers should see the before/after and understand why it changed. A silent golden-image update is how subtle rendering regressions slip in.

Run Fast, Run Often

A shader test suite that takes 20 minutes runs once per release. A shader test suite that takes 30 seconds runs on every pull request. Keep fixtures small (256x256 is enough for most tests), keep the fixture set under 50 scenes, and parallelize across CPU cores. The fast-path cost is proportional to total test runs. Make it cheap.

“A shader that hasn’t been rendered on a player’s hardware has never actually been tested. The CI farm is the cheapest player hardware you’ll ever own.”

Related Issues

For root-causing shader issues across hardware, see how to reproduce a GPU driver crash locally. For shader-adjacent lighting bugs, see debugging Unity URP lighting issues.

Shader code is code. If you wouldn’t ship a C# function untested, don’t ship an HLSL function untested either.