Feature toggles, feature flags – they’ve been written about a lot already (use a search engine :)), yet I feel like writing a post about them. Why? I’ve been personally involved in two from-scratch implementations, and it may be interesting for folk to read about that.
I say that lots has been written; http://featureflags.io/ (which appears to be a bit of an astroturf site for LaunchDarkly 😉 ) nevertheless has gathered a bunch of links to literature as well as a number of SDKs and the like; there are *other* FFaaS offerings than LaunchDarkly; I have no idea which I would use for my next project at this point – but hopefully you’ll have some tools to reason about that at the end of this piece.
I’m going to entirely skip over the motivation (go read those other pieces), other than to say that the evidence is in, trunk based development is better.
A feature flag is a very simple thing: it is a value controlled outside of your development cycle that in turn controls the behaviour of your code. There are dozens of ways to implement that. hash-defines and compile time flags have been used for a very long time, so long that we don’t even think of them as feature flags, but they are are. So are configuration options in configuration files in the broadest possible sense. The difference is largely in focus, and where the same system meets all parties needs, I think its entirely fine to use just the one system – that is what we did for Launchpad, and it worked quite well I think – as far as I know it hasn’t been changed. Specifically in Launchpad the Zope runtime config is regular ZCML files on disk, and feature flags are complementary to that (but see the profiling example below).
Configuration tends to be thought of as “choosing behaviour after the system is compiled and before the process is started” – e.g. creating files on disk. But this is not always the case – some enterprise systems are notoriously flexible with database managed configuration rulesets which no-one can figure out – and we don’t want to create that situation.
Lets generalise things a little – a flag could be configured over the lifetime of the binary (compile flag), execution (runtime flag/config file or one-time evaluation of some dynamic system), time(dynamically reconfigured from changed config files or some dynamic system), or configured based on user/team/organisation, URL path (of a web request naturally :P), and generally any other thing that could be utilised in making a decision about whether to conditionally perform some code. It can also be useful to be able to randomly bucket some fraction of checks (e.g. 1/3 of all requests will go down this code path).. but do it consistently for the same browser.
Depending on what sort of system you are building, some of those sorts of scopes will be more or less important to you – for instance, if you are shipping on-premise software, you may well want to be turning unreleased features entirely off in the binary. If you are shipping a web API, doing soft launches with population rollouts and feature kill switches may be your priority.
Similarly, if you have an existing microservices architecture, having a feature flags aaS API is probably much more important (so that your different microservices can collaborate on in-progress features!) than if you have a monolithic DB where you put all your data today.
Ultimately you will end up with something that looks roughly like a key-value store: get_flag_value(flagname, context) -> value
. Somewhere separate to your code base you will have a configuration store where you put rules that define how that key-value interface comes to a given value.
There are a few key properties that I consider beneficial in a feature flag systems:
- Graceful degradation
- Permissionless / (or alternatively namespaced)
- Loosely typed
- Observable
- Centralised
- Dynamic
Graceful Degradation
Feature flags will be consulted from all over the place – browser code, templates, DB mapper, data exporters, test harnesses etc. If the flag system itself is degraded, you need the systems behaviour to remain graceful, rather than stopping catastrophically. This often requires multiple different considerations; for instance, having sensible defaults for your flags (choose a default that is ok, change the meaning of defaults as what is ‘ok’ changes over time), having caching layers to deal with internet flakiness or API blips back to your flag store, making sure you have memory limits on local caches to prevent leaks and so forth. Different sorts of flag implementations have different failure modes : an API based flag system will be quite different to one stored in the same DB the rest of your code is using, which will be different to a process-startup command line option flag system.
A second dimension where things can go wrong is dealing with missing or unexpected flags. Remember that your system changes over time: a new flag in the code base won’t exist in the database until after the rollout, and when a flag is deleted from the codebase, it may still be in your database. Worse, if you have multiple instances running of a service, you may have different code all examining the same flag at the same time, so operations like ‘we are changing the meaning of a flag’ won’t take place atomically.
Permissions
Flags have dual audiences; one part is pure dev: make it possible to keep integration costs and risks low by merging fully integrated code on a continual basis without activating not-yet-ready (or released!) codepaths. The second part is pure operations: use flags to control access to dark launches, demo new features, killswitch parts of the site during attack mitigation, target debug features to staff and so forth.
Your developers need some way to add and remove the flags needed in their inner loop of development. Lifetimes of a few days for some flags.
Whoever is doing operations on prod though, may need some stronger guarantees – particularly they may need some controls over who can enable what flags. e.g. if you have a high control environment then team A shouldn’t be able to influence team B’s flags. One way is to namespace the flags and only permit configuration for the namespace a developer’s team(s) owns. Another way is to only have trusted individuals be able to set flags – but this obviously adds friction to processes.
Loose Typing
Some systems model the type of each flag: is it boolean, numeric, string etc. I think this is a poor idea mainly because it tends to interact poorly with the ephemeral nature of each deployment of a code base. If build X defines flag Y as boolean, and build X+1 defines it as string, the configuration store has to interact with both at the same time during rollouts, and do so gracefully. One way is to treat everything as a string and cast it to the desired type just in time, with failures being treated as default.
Observable
Make sure that when a user reports crazy weird behaviour, that you can figure out what value they had for what flags. For instance, in Launchpad we put them in the HTML.
Centralised
Having all your flags in one system lets you write generic tooling – such as ‘what flags are enabled in QA but not production’, or ‘what flags are set but have not been queried in the last month’. It is well worth the effort to build a single centralised system (or consume one such thing) and then use it everywhere. Writing adapters to different runtimes is relatively low overhead compared to rummaging through N different config systems because you can’t remember which one is running which platform.
Scope things in the system with a top level tenant / project style construct (any FFaaS will have this I’m sure :)).
Dynamic
There may be some parts of the system that cannot apply some flags rapidly, but generally speaking the less poking around that needs to be done to make something take effect the better. So build or buy a dynamic system, and if you want a ‘only on process restart’ model for some bits of it, just consult the dynamic system at the relevant time (e.g. during k8s Deployment object creation, or process startup, or …). But then everywhere else, you can react just-in-time; and even make the system itself be script driven.
The Launchpad feature flag system
I was the architect for Launchpad when the flag system was added. Martin Pool wanted to help accelerate feature development on Launchpad, and we’d all become aware of the feature flag style things hip groups like YouTube were doing; so he wrote a LEP: https://dev.launchpad.net/LEP/FeatureFlags , pushed that through our process and then turned it into code and docs (and once the first bits landed folk started using and contributing to it). Here’s a patch I wrote using the system to allow me to turn on Python profiling remotely. Here’s one added by William Grant to allow working around a crash in a packaging tool.
Launchpad has a monolithic data store, with bulk data federated out to various disk stores, but all relational data in one schema; we didn’t see much benefit in pushing for a dedicated API per se at that time – it can always be added later, as the design was deliberately minimal. The flags implementation is all in-process as a result, though there may be a JS thunk at this point – I haven’t gone looking. Permissions are done through trusted staff members, it is loosely typed and has an audit log for tracking changes.
The other one
The other one I was involved in was at VMware a couple years ago now; its in-house, but some interesting anecdotes I can share. The thinking on feature flags when I started the discussion was that they were strictly configuration file settings – I was still finding my feet with the in-house Xenon framework at the time (I think this was week 3 ? 🙂 so I whipped up an API specification and a colleague (Tyler Curtis) turned that into a draft engine; it wasn’t the most beautiful thing but it was still going strong and being enhanced by the team when I left earlier this year. The initial implementation had a REST API and a very basic set of scopes. That lasted about 18 months before tenant based scopes were needed and added. I had designed it with the intent of adding multi-arm bandit selection down the track, but we didn’t make the time to develop that capability, which is a bit sad.
Comparing that API with LaunchDarkly I see that they do support A/B trials but don’t have multivariate tests live yet, which suggests that they are still very limited in that space. I think there is room for some very simple home grown work in this area to pay off nicely for Symphony (the project codename the flags system was written for).
Should you run your own?
It would be very unusual to have PII or customer data in the flag configuration store; and you shouldn’t have access control lists in there either (in LP we did allow turning on code by group, which is somewhat similar). Point is, that the very worst thing that can happen if someone else controls your feature flags and is malicious is actually not very bad. So as far as aaS vendor trust goes, not a lot of trust is needed to be pretty comfortable using one.
But, if you’re in a particularly high-trust environment, or you have no internet access, running your own may be super important, and then yeah, do it :). They aren’t big complex systems, even with multi-arm bandit logic added in (the difficulty there is the logic, not the processing).
Or if you think the prices being charged by the incumbents are ridiculous. Actually, perhaps hit me up and we’ll make a startup and do this right…
Should you build your own?
A trivial flag system + persistence could be as little as a few days work. Less if you grab an existing bolt-on for your framework. If you have multiple services, or teams, or languages.. expect that to become the gift that keeps on giving as you have to consolidate and converge across your organisation indefinitely. If you have the resources – great, not a problem.
I think most people will be better off taking one of the existing open source flag systems – perhaps https://unleash.github.io/ – and using it; even if it is more complex than a system tightly fitted to your needs, the benefit of having one that is a true API from the start will pay for itself the very first time you split a project, or want to report what features are on in dev and off in prod, not to mention multiple existing language bindings etc.