Why not just use boolean toggles?

Boolean toggles cannot express rollouts (10 percent of players), cohort targeting (PlayStation 5 users in the EU), or A/B splits. A rule-based evaluator replaces the ad-hoc if-statements that accumulate around every flag and gives you safe gradual rollout and experimentation in one abstraction.

How do you keep flag evaluation fast in the hot path?

Evaluate once per frame, not per call. Cache the evaluated value keyed by flag name in a thread-local map, and invalidate the cache when the evaluation context changes (level loaded, player moved regions). Evaluation cost then drops to a hash-map lookup.

They should. Tag each flag with an owner and an expiry date. Any flag older than 90 days past expiry becomes a release blocker until it is either removed or renewed. Without this, your codebase accumulates dead flags that become load-bearing over time.

How to Build a Feature Flag Evaluation Framework

Quick answer: Replace boolean flags with a rule-based evaluator that takes an evaluation context (user ID, country, platform, hardware tier, client version) and returns a variant. Each flag has an ordered rule list, the first match wins, and evaluation is cached per session to stay cheap. A debug overlay showing which rule matched for every flag removes most of the mystery when players report inconsistent behavior.

Feature flags start as a reasonable idea — a boolean that lets you turn on a feature without redeploying — and accumulate into a maintenance nightmare where nobody knows which flags do what or who owns them. The fix is to stop treating flags as booleans and start treating them as tiny programs. A flag evaluation framework gives you rollouts, cohort targeting, A/B tests, and kill switches with the same mental model, and lets you build the debugging tools once rather than ad hoc per flag.

The Evaluation Context

Every flag evaluation needs the same inputs. Define them once in a struct and pass it to every evaluator.

type EvalContext struct {
    UserID         string
    AccountAgeDays int
    Country        string
    Platform       string  // "win","mac","linux","ps5","xsx","switch"
    ClientVersion  string  // semver
    HardwareTier   string  // "low","mid","high"
    CohortTags     []string // ["beta","ranked","early_access"]
    RandomBucket   int     // hash(UserID) % 100
}

Populate the context once per session at login. Everything a flag could reasonably depend on lives here. The RandomBucket is pre-computed as a stable hash of the user ID so rollouts (“show this to 10 percent of players”) are deterministic per user and don’t flicker between sessions.

Rules as Ordered Match Lists

A flag is a list of rules plus a default. Each rule has a condition and a variant. The evaluator runs the rules top to bottom and returns the first variant whose condition matches. If nothing matches, return the default.

{
  "flag": "new_lighting_model",
  "default": "off",
  "rules": [
    { "when": { "cohort": "beta" },              "variant": "on" },
    { "when": { "hardware_tier": "low" },        "variant": "off" },
    { "when": { "platform_in": ["ps5","xsx"] },  "variant": "on" },
    { "when": { "random_bucket_lt": 10 },       "variant": "on" }
  ]
}

This single structure covers kill switches (put a blanket “off” rule at the top), rollouts (use random_bucket), cohort targeting (cohort tags), region restrictions (country), hardware-appropriate features (hardware_tier), and platform gating. Any new use case is a new predicate in the matcher, not a new framework.

A Tiny Matcher

Keep the rule DSL small. Ten predicates is plenty: equals, not-equals, in-list, less-than, greater-than, semver-at-least, contains, and-of, or-of, not. Resist the temptation to add full scripting. A flag is a decision, not a program; complicated logic belongs in code.

func Match(cond Condition, ctx EvalContext) bool {
    switch cond.Op {
    case "eq":
        return fieldOf(ctx, cond.Field) == cond.Value
    case "in":
        return slices.Contains(cond.Values, fieldOf(ctx, cond.Field))
    case "lt":
        return fieldInt(ctx, cond.Field) < cond.IntValue
    case "semver_ge":
        return semver.Compare(ctx.ClientVersion, cond.Value) >= 0
    case "and":
        for _, c := range cond.Children {
            if !Match(c, ctx) { return false }
        }
        return true
    }
    return false
}

Caching for the Hot Path

Even a simple matcher costs a map lookup plus a string compare per rule. If you check is_new_ui_enabled once per frame from your UI rendering code, that cost adds up. Cache every evaluated result in a per-session map keyed by flag name. Invalidate the cache only when the evaluation context changes materially: player moved region, joined a new cohort, installed a client update.

class FlagCache:
    def __init__(self, ctx):
        self.ctx = ctx
        self.values = {}

    def get(self, name):
        if name not in self.values:
            self.values[name] = evaluator.evaluate(name, self.ctx)
        return self.values[name]

    def invalidate(self):
        self.values.clear()

After caching, flag checks are a hash-map lookup and are cheap enough to use anywhere, including 60-times-a-second render code.

The Debug Overlay Is Not Optional

When a player reports that a feature behaves differently on their machine, you need to know which variant they are getting and why. Build a debug overlay accessible via a developer console that dumps every currently evaluated flag: name, variant, the index of the rule that matched, and the context snapshot used for evaluation.

Most bugs in flag systems come from unexpected rule ordering or context mismatches. An overlay that says “flag new_lighting evaluated to off because rule 1 matched (cohort=beta but hardware_tier=low was checked first)” solves these in seconds.

Hot Reload Without Restart

The whole point of flags is to change behavior without a deploy. Fetch rule definitions from a server on a 60-second poll (or push via websocket) and atomically swap the rule set. Invalidate the flag cache on swap. Your game must survive rule changes mid-session without crashing.

Keep the current rule set in a versioned blob. If a fetch fails, stick with the previous version and log the failure; never fall back to “no rules” because that turns every flag into its default, which is usually the off state.

Expire Flags Aggressively

Flags are debt. A flag that has been 100 percent rolled out for six months is dead weight — the code path it guards should be unconditional or deleted. Tag every flag at creation with an owner, a purpose, and an expected removal date. Any flag still alive 90 days past its removal date becomes a release blocker until someone either removes it or extends the date with justification.

“Our old flag system was just if-statements scattered through the codebase. We had 120 of them and no idea which ones still did anything. Three weeks of work to migrate to a rule evaluator with expiry dates, and we cleaned out 80 dead flags in the process. The codebase is noticeably simpler now.”

Related Issues

For related release-management topics, see how to track and reduce crash rate over releases. For controlled chaos and canary scenarios where flags shine, read how to set up chaos testing for game servers.

Pick one of your existing boolean flags and rewrite it as a rule this week. You will immediately see how to express the rollout, hardware gate, or cohort targeting you’ve been faking with stacked if-statements.