Quick answer: Use Godot's built-in Visual Profiler for a per-frame GPU time breakdown, enable the Overdraw debug view to see where expensive pixels are being redrawn, and attach RenderDoc for nanosecond-level per-draw-call shader timing. Enable the pipeline cache to stop shader compile stutter.

Godot 4's renderer gives you a lot more graphical power than Godot 3 did, and it also gives you a lot more ways to accidentally write a shader that tanks your frame rate. The Vulkan backend is fast but unforgiving: a single bad fragment shader on a fullscreen quad can cost you 8ms of frame time, which is the entire budget for a 120 Hz game. Here is how to find those shaders before your players do.

Why Shader Profiling Matters

CPU profiling is straightforward because GDScript and C# both have timeline tools built into the Godot debugger. Shader profiling is harder because the GPU runs hundreds of thousands of fragment invocations per frame in parallel, and "is this shader expensive" depends on how many pixels the rasterizer covers, how many texture samples it makes, and what branches each invocation takes. A shader that is fine on a 100x100 quad is ruinous on a fullscreen post-process pass.

The goal of shader profiling is to answer three questions: (1) Which draw calls are eating my GPU frame budget? (2) Within those draw calls, which shader operations are the expensive ones? (3) Are my shaders compiling at runtime and causing hitches?

Step 1: The Built-In Visual Profiler

Open your project in Godot 4, hit Play, and switch to the Debugger → Visual Profiler tab. This shows a stacked area chart of where GPU frame time is going.

The categories you care about are:

If any single category exceeds 30–40% of your frame budget, that is your first suspect. Watch it change as you move the camera around the scene — categories that spike on specific views point you to a specific material or node.

Step 2: Overdraw Debug View

Turn on the Overdraw debug view (top-right in the 3D viewport or via RenderingServer.set_debug_generate_wireframes for 2D). Every pixel is tinted based on how many times it was drawn during the current frame.

If you see red anywhere, you are running your fragment shader multiple times on the same pixel. Transparent particles are the most common culprit: a dense particle system can easily hit 10–20x overdraw. Fixes include smaller particles, alpha-to-coverage instead of blending, reducing particle count, or using a simpler fragment shader on the particle material.

Step 3: RenderDoc for Per-Call Profiling

RenderDoc is a free, open-source frame debugger that captures a single frame from your running game and lets you inspect every draw call individually. It supports Vulkan out of the box, which is Godot 4's default backend.

Install RenderDoc, launch it, and point it at your Godot editor executable (or your exported game binary):

# Linux / macOS
renderdoccmd capture /path/to/godot --args "/path/to/project.godot"

# Windows: launch from the RenderDoc GUI
# File → Launch Application → select godot.exe

Press F12 while your game is running to capture a frame. RenderDoc opens the capture in a new window with a timeline of every draw call, each annotated with its GPU time in microseconds.

Sort the event browser by Duration descending. The top entries are your expensive shaders. Click one to see:

This is the deepest level of shader profiling available to Godot developers, and it is completely free.

Step 4: Audit Expensive Shader Operations

Once you know which shader is slow, look for these red flags in the code:

Branches in fragment shaders. GPUs execute fragments in wavefronts (groups of 32 or 64 pixels). If any fragment in a wavefront takes a branch, all fragments in the wavefront take both branches. Keep fragment shader logic branch-free where possible; use mix and step instead of if.

Loop unrolling. Loops with unknown iteration counts are expensive. If you can fix the count at compile time, use #define and let the compiler unroll.

Texture samples per pixel. Each texture sample is a cache hit or miss, and misses are slow. A post-process shader sampling the framebuffer four times is four times the cost of one sample. Minimize the number of samples and prefer nearby pixels (which share the cache line).

Transcendentals. sin, cos, pow, exp, and log are 2–10x more expensive than basic arithmetic on most GPUs. Precompute them into lookup textures when possible.

Per-pixel normalization. normalize(v) hides a square root. If you are normalizing the same vector on every pixel of a surface, compute it once in the vertex shader and pass it as a varying.

Step 5: Fix Shader Compile Stutter

Godot 4 compiles shaders lazily. The first time a material is seen, its shader is compiled, and the main thread stalls for 5–500 ms depending on complexity. This produces a visible hitch the first time each new effect appears.

The fix is the Vulkan pipeline cache:

# project.godot
[rendering]
rendering_device/pipeline_cache/enable=true
rendering_device/pipeline_cache/save_chunk_size_mb=3.0

With the cache enabled, the first launch still compiles shaders lazily but saves the compiled pipelines to user://pipeline_cache. Subsequent launches reuse the cache with no compile cost. This is the single biggest shader-related quality-of-life improvement in Godot 4.

For even better results, pre-warm shaders during a loading screen by instantiating every material once off-screen before gameplay begins. This forces compilation during a moment when the player already expects to wait.

Example: Profiling a Post-Process Vignette

Say your vignette shader looks fine visually but the Visual Profiler shows Post-Process costing 5 ms of your 16 ms budget. You open RenderDoc, capture a frame, and find this fragment shader:

// expensive version
shader_type canvas_item;

void fragment() {
    vec2 uv = SCREEN_UV;
    float dist = length(uv - vec2(0.5));
    float vignette = pow(1.0 - dist, 2.5);
    vec4 base = texture(SCREEN_TEXTURE, uv);
    COLOR = base * vignette;
}

The pow(1.0 - dist, 2.5) is the killer. pow with a non-integer exponent expands into an exp and a log, which are slow. Replacing it with (1.0 - dist) * (1.0 - dist) * (1.0 - dist) (an integer cube approximation) reduces the cost by 4–5x on a typical mobile GPU:

// cheaper version
void fragment() {
    vec2 uv = SCREEN_UV;
    float dist = length(uv - vec2(0.5));
    float inv = 1.0 - dist;
    float vignette = inv * inv * inv;
    COLOR = texture(SCREEN_TEXTURE, uv) * vignette;
}

One profiling session, one-line change, 3 ms per frame reclaimed.

"If your game drops frames and you have not opened RenderDoc, you do not yet know why."

Related Issues

For general Godot performance debugging see Godot performance profiling guide. If your shaders are compiling at runtime and causing stutter, also read Godot shader not compiling.

The GPU is not magic. It runs code. If your code is slow, the GPU will be slow. Profile it.