Why do localized UIs break when English works?

Other languages are often longer than English: German words can be 2x wider, Russian needs more vertical space, Thai and Arabic use complex shaping that breaks naive layout code. Buttons designed for English overflow in German, labels clip in Russian, and right-to-left languages mirror in ways your code may not expect.

How do I automate screenshots for every language?

Script a batch-mode run that iterates every supported language, loads each key screen, sets the locale, and saves a PNG. Commit the screenshots as baselines. On every build, re-run and diff against the baseline - any pixel difference is a potential bug worth reviewing.

What tools should I use for visual diffing?

odiff, pixelmatch, and Resemble.js are popular open-source image diffing libraries. For Unity, Graphy and the built-in ImageAssert helper work. For self-hosted CI, odiff is the fastest and most accurate. Allow a small tolerance (1-3%) to account for GPU driver variance in font rendering.

How to Set Up Automated Screenshot Testing for Multiple Languages

Quick answer: Build a batch-mode screenshot runner that iterates every supported language, captures PNGs of every key screen, and compares them against committed baselines with odiff or similar. Fail CI on unexpected differences above a 1–3% tolerance. Catches overflow, clipping, and RTL mirroring bugs before they ship.

Your English UI is pixel-perfect. You hand the project off for German localization, and when you see it for the first time, every button is overflowing, every label is clipped, and the character’s name has crashed into the HP bar. The bug is not in your code — it is in the assumption that UI that fits one language will fit all of them. Automated screenshot testing per locale catches these before players do.

Why This Matters

Length and shape vary wildly across languages. A few approximate ratios (text length relative to English):

German: 1.4x to 2x. “Settings” → “Einstellungen”.
Russian: 1.3x. Longer words, Cyrillic often requires slightly more vertical space.
French: 1.2x. Moderate expansion.
Japanese/Chinese: 0.5x. Much shorter horizontally, but glyphs are bigger.
Arabic/Hebrew: 1x length, right-to-left, requires layout mirroring.
Thai: no word breaks, stacking diacritics above and below.

A button sized for English “OK” does not fit German “Bestätigen”. A label aligned left breaks in RTL. A text box with a fixed height clips Thai diacritics. None of these are caught by unit tests; all of them are caught the moment a screenshot diff shows the wrong pixels.

Step 1: Script a Screenshot Runner

Build a test script that can run in batch mode (no interactive input) and cycle through screens and languages. The exact API depends on your engine.

// Unity batch mode screenshot runner
using System.Collections;
using UnityEngine;

public class LocalizationScreenshotRunner : MonoBehaviour
{
    private readonly string[] _languages = { "en", "de", "fr", "ru", "ja", "zh-cn", "ar", "th" };
    private readonly string[] _screens = { "main_menu", "settings", "inventory", "pause", "shop", "credits" };

    IEnumerator Start()
    {
        foreach (var lang in _languages)
        {
            LocalizationManager.SetLanguage(lang);
            yield return new WaitForSeconds(0.5f);

            foreach (var screen in _screens)
            {
                SceneManager.LoadSceneAsync(screen);
                yield return new WaitForSeconds(1.0f);
                ScreenCapture.CaptureScreenshot($"screenshots/{lang}_{screen}.png");
                yield return new WaitForSeconds(0.5f);
            }
        }
        Application.Quit(0);
    }
}

Invoke from CI:

Unity -batchmode -nographics \
  -projectPath . \
  -executeMethod LocalizationScreenshotRunner.RunFromCI \
  -logFile unity.log
# After the run, screenshots/ directory contains all the PNGs

Step 2: Commit Baselines

Run the script once. Manually review every screenshot — yes, every one. This is the only time you look at each pixel on purpose. Fix anything that is obviously wrong. When every language is acceptable, commit the screenshots directory as screenshots/baseline/.

Baselines are source of truth. Never re-commit them without a human review.

Step 3: Diff on Every Build

On every CI build, run the runner again and output to screenshots/current/. Diff each file against the baseline:

# Install odiff
npm install -g odiff-bin

# Diff every baseline against its current
mkdir -p screenshots/diff
fail=0
for f in screenshots/baseline/*.png; do
  name=$(basename "$f")
  odiff "$f" "screenshots/current/$name" "screenshots/diff/$name" \
    --threshold 0.01 \
    --antialiasing || fail=1
done
exit $fail

If anything exceeds the threshold, the CI step fails and uploads the diff images as artifacts so a human can decide whether the change is intended.

Step 4: Update Baselines Deliberately

When a UI change is intentional, update the baselines in the same commit:

# After making the UI change and running locally
cp screenshots/current/*.png screenshots/baseline/
git add screenshots/baseline/
git commit -m "Update screenshot baselines for redesigned shop"

A code reviewer looking at the PR should manually inspect the new baselines and confirm they look right. Treat baseline updates as seriously as code review.

Handling Flaky Diffs

Screenshot tests can flake for reasons that are not your fault:

GPU driver updates change anti-aliasing.
Font rendering differs slightly across OS versions.
Particle systems have randomized positions.
Animation timings catch different frames.

Mitigate with:

A 1–3% pixel tolerance (odiff’s --threshold).
Disabling animations during tests (force to a specific frame).
Hiding dynamic content (timestamps, random seeds) with stable placeholder values.
Running on a fixed CI image so the OS and drivers never change.

The goal is “every diff flagged is worth looking at,” not “zero diffs ever.”

What to Capture

Start small. Main menu, pause menu, settings, one gameplay screen. These are the most linguistically dense and the most visited. Expand over time to:

Character creator (long name fields, title tooltips)
Inventory (item names, descriptions, tooltips)
Dialogue box (multi-line text with portraits)
Achievement notification (short titles, long descriptions)
Error dialogs (wrapped multi-paragraph text)

Eight to twelve screens per language is usually enough to catch 90% of localization UI bugs.

“Localization bugs do not appear until you look. An automated screenshot diff is the cheapest way to look at every language, every build, without asking a human to stare at fifty images.”

Related Resources

For broader visual regression testing, see how to use visual snapshot testing for game UI regressions. For broader localization bugs, see game localization testing common bugs and how to track and fix localization bugs in your game.

Include pseudo-locale (replace every character with an accented version and pad by 30%) in your test matrix. It catches nearly every overflow bug without needing a real translation.