Quick answer: Replace boolean flags with a rule-based evaluator that takes an evaluation context (user ID, country, platform, hardware tier, client version) and returns a variant. Each flag has an ordered rule list, the first match wins, and evaluation is cached per session to stay cheap. A debug overlay showing which rule matched for every flag removes most of the mystery when players report inconsistent behavior.
Feature flags start as a reasonable idea — a boolean that lets you turn on a feature without redeploying — and accumulate into a maintenance nightmare where nobody knows which flags do what or who owns them. The fix is to stop treating flags as booleans and start treating them as tiny programs. A flag evaluation framework gives you rollouts, cohort targeting, A/B tests, and kill switches with the same mental model, and lets you build the debugging tools once rather than ad hoc per flag.
The Evaluation Context
Every flag evaluation needs the same inputs. Define them once in a struct and pass it to every evaluator.
type EvalContext struct {
UserID string
AccountAgeDays int
Country string
Platform string // "win","mac","linux","ps5","xsx","switch"
ClientVersion string // semver
HardwareTier string // "low","mid","high"
CohortTags []string // ["beta","ranked","early_access"]
RandomBucket int // hash(UserID) % 100
}
Populate the context once per session at login. Everything a flag could reasonably depend on lives here. The RandomBucket is pre-computed as a stable hash of the user ID so rollouts (“show this to 10 percent of players”) are deterministic per user and don’t flicker between sessions.
Rules as Ordered Match Lists
A flag is a list of rules plus a default. Each rule has a condition and a variant. The evaluator runs the rules top to bottom and returns the first variant whose condition matches. If nothing matches, return the default.
{
"flag": "new_lighting_model",
"default": "off",
"rules": [
{ "when": { "cohort": "beta" }, "variant": "on" },
{ "when": { "hardware_tier": "low" }, "variant": "off" },
{ "when": { "platform_in": ["ps5","xsx"] }, "variant": "on" },
{ "when": { "random_bucket_lt": 10 }, "variant": "on" }
]
}
This single structure covers kill switches (put a blanket “off” rule at the top), rollouts (use random_bucket), cohort targeting (cohort tags), region restrictions (country), hardware-appropriate features (hardware_tier), and platform gating. Any new use case is a new predicate in the matcher, not a new framework.
A Tiny Matcher
Keep the rule DSL small. Ten predicates is plenty: equals, not-equals, in-list, less-than, greater-than, semver-at-least, contains, and-of, or-of, not. Resist the temptation to add full scripting. A flag is a decision, not a program; complicated logic belongs in code.
func Match(cond Condition, ctx EvalContext) bool {
switch cond.Op {
case "eq":
return fieldOf(ctx, cond.Field) == cond.Value
case "in":
return slices.Contains(cond.Values, fieldOf(ctx, cond.Field))
case "lt":
return fieldInt(ctx, cond.Field) < cond.IntValue
case "semver_ge":
return semver.Compare(ctx.ClientVersion, cond.Value) >= 0
case "and":
for _, c := range cond.Children {
if !Match(c, ctx) { return false }
}
return true
}
return false
}
Caching for the Hot Path
Even a simple matcher costs a map lookup plus a string compare per rule. If you check is_new_ui_enabled once per frame from your UI rendering code, that cost adds up. Cache every evaluated result in a per-session map keyed by flag name. Invalidate the cache only when the evaluation context changes materially: player moved region, joined a new cohort, installed a client update.
class FlagCache:
def __init__(self, ctx):
self.ctx = ctx
self.values = {}
def get(self, name):
if name not in self.values:
self.values[name] = evaluator.evaluate(name, self.ctx)
return self.values[name]
def invalidate(self):
self.values.clear()
After caching, flag checks are a hash-map lookup and are cheap enough to use anywhere, including 60-times-a-second render code.
The Debug Overlay Is Not Optional
When a player reports that a feature behaves differently on their machine, you need to know which variant they are getting and why. Build a debug overlay accessible via a developer console that dumps every currently evaluated flag: name, variant, the index of the rule that matched, and the context snapshot used for evaluation.
Most bugs in flag systems come from unexpected rule ordering or context mismatches. An overlay that says “flag new_lighting evaluated to off because rule 1 matched (cohort=beta but hardware_tier=low was checked first)” solves these in seconds.
Hot Reload Without Restart
The whole point of flags is to change behavior without a deploy. Fetch rule definitions from a server on a 60-second poll (or push via websocket) and atomically swap the rule set. Invalidate the flag cache on swap. Your game must survive rule changes mid-session without crashing.
Keep the current rule set in a versioned blob. If a fetch fails, stick with the previous version and log the failure; never fall back to “no rules” because that turns every flag into its default, which is usually the off state.
Expire Flags Aggressively
Flags are debt. A flag that has been 100 percent rolled out for six months is dead weight — the code path it guards should be unconditional or deleted. Tag every flag at creation with an owner, a purpose, and an expected removal date. Any flag still alive 90 days past its removal date becomes a release blocker until someone either removes it or extends the date with justification.
“Our old flag system was just if-statements scattered through the codebase. We had 120 of them and no idea which ones still did anything. Three weeks of work to migrate to a rule evaluator with expiry dates, and we cleaned out 80 dead flags in the process. The codebase is noticeably simpler now.”
Related Issues
For related release-management topics, see how to track and reduce crash rate over releases. For controlled chaos and canary scenarios where flags shine, read how to set up chaos testing for game servers.
Pick one of your existing boolean flags and rewrite it as a rule this week. You will immediately see how to express the rollout, hardware gate, or cohort targeting you’ve been faking with stacked if-statements.