--- title: "Feature flags: differences between backend, frontend and mobile" date: 2020-10-14 layout: post lang: en ref: feature-flags-differences-between-backend-frontent-and-mobile category: presentation published: false --- *This article is derived from a [presentation][presentation] on the same subject.* When talking about [feature flags][feature-flags-article], I find that their costs and benefits are often well exposed and addressed. However the weight of those costs and benefits apply differently on backend, frontend or mobile, and those differences aren't covered. I'll try to make this distinction clear, with some final best practices I've acquired when using them in production. [presentation]: {% link _slides/2020-10-14-rollout-feature-flag-experiment-operational-toggle.slides %} [feature-flags-article]: https://martinfowler.com/articles/feature-toggles.html ## Why feature flags Feature flags in general tend to be cited on the context of [continuous deployment][cd]: > A: With continuous deployment, you deploy to production automatically > B: But how do I handle deployment failures, partial features, *etc.*? > A: With techniques like canary, monitoring and alarms, feature flags, *etc.* Even though adopting continuous deployment doesn't force you to use feature flags, it creates a demand for it. The inverse is also true: using feature flags on the code points you more obviously to continuous deployment. But you should consider feature flags solely by taking into account this distilled trade-off analysis: > Am I willing to pay with code complexity to get dynamicity? It is true that you can make the management of feature flags as straightforward as possible, but having no feature flags is simpler than having any. What you get in return is the ability to parameterize the behaviour of the application at runtime, without doing any code changes. Sometimes this added complexity may tilt the balance towards not using a feature flag, and sometimes the flexibility of changing behaviour at runtime is absolutely worth the added complexity. This can vary a lot by code base, feature, but fundamentally by environment: its much cheaper to deploy a new version of a service than to release a new version of an app. [cd]: https://www.atlassian.com/continuous-delivery/principles/continuous-integration-vs-delivery-vs-deployment ## Control over the environment The key differentiator that makes the trade-offs apply differently is how much control you have over the environment. When running a **backend** service, you usually are paying for the servers themselves, and can tweak them as you wish. This means you have full control do to code changes as you wish. Not only that, you decide when to do it, and for how long the transition will last. On the **frontend** you have less control: even though you can choose to make a new version available any time you wish, you can't force[^force] clients to immediately switch to the new version. That means that a) clients could skip upgrades at any time and b) you always have to keep backward and forward compatibility in mind. Even though I'm mentioning frontend directly, it applies to other environment with similar characteristics: desktop applications, command-line programs, *etc*. On **mobile** you have even less control: app stores need to allow your app to be updated, which could bite you when least desired. Theoretically you could make you APK available on third party stores like [F-Droid][f-droid], or even make the APK itself available for direct download, which would give you the same characteristics of a frontend application, but that happens less often. On iOS you can't even do that. You have to get Apple's blessing on every single update. Even though we already know that is a [bad idea][apple] for over a decade now, there isn't a way around it. This is where you have the least control. In practice, the amount of control you have will change how much you value dynamicity: the less control you have, the more valuable it is. In other words, having a dynamic flag on the backend may or may not be worth it since you could always update the code immediately after, but on iOS it is basically always worth it. [f-droid]: https://f-droid.org/ [^force]: Technically you could force a reload with JavaScript using `window.location.reload()`, but that not only is invasive and impolite, but also gives you the illusion that you have control over the client when you actually don't: clients with disabled JavaScript would be immune to such tactics. [apple]: http://www.paulgraham.com/apple.html ## Rollout A rollout is used to *roll out* a new version of software. They are usually short-lived, being relevant as long as the new code is being deployed. The most common rule is percentages. On the **backend**, it is common to find it on the deployment infrastructure itself, like canary servers, blue/green deployments, [a kubernetes deployment rollout][k8s], *etc*. You could do those manually, by having a dynamic control on the code itself, but rollbacks are cheap enough that people usually do a normal deployment and just give some extra attention to the metrics dashboard. On the **frontend**, CDN propagation delays and people not refreshing their web pages are rollouts by themselves. You could do this by geographical region or something similar, if desired. On **mobile**, the Play Store allows you to perform fine-grained [staged rollouts][staged-rollouts], and the App Store allows you to perform limited [phased releases][phased-releases]. [k8s]: https://kubernetes.io/docs/concepts/workloads/controllers/deployment/#creating-a-deployment [staged-rollouts]: https://support.google.com/googleplay/android-developer/answer/6346149?hl=en [phased-releases]: https://help.apple.com/app-store-connect/#/dev3d65fcee1 ## Feature flag A feature flag is a *flag* that tells the application on runtime to turn on or off a given *feature*. That means that the actual production code will have more than one possible code paths to go through, and that a new version of a feature coexists with the old version. The feature flag tells which part of the code to go through. They are usually medium-lived, being relevant as long as the new code is being developed. The most common rules are percentages, allow/deny lists, A/B groups and client version. On the **backend**, those are useful for things that have a long development cycle, or that needs to done by steps. Consider loading the feature flag rules in memory when the application starts, so that you avoid querying a database or an external service for applying a feature flag rule and avoid flakiness on the result due to intermittent network failures. Since on the **frontend** you don't control when to update the client software, you're left with applying the feature flag rule on the server, and exposing the value through an API for maximum dynamicity. This could be in the frontend code itself, and fallback to a "just refresh the page"/"just update to the latest version" strategy for less dynamic scenarios. On **mobile** you can't even rely on a "just update to the latest version" strategy, since the code for the app could be updated to a new feature and be blocked on the store. Those cases aren't recurrent, but you should always assume the store will deny updates on critical moments so you don't find yourself with no cards to play. That means the only control you actually have is via the backend, by parameterizing the runtime of the application using the API. In practice, you should always have a feature flag to control any relevant piece of code. There is no such thing as "too small code change for a feature flag". What you should ask yourself is: > If the code I'm writing breaks and stays broken for around a month, do I care? If you're doing an experimental screen, or something that will have a very small impact you might answer "no" to the above question. For everything else, the answer will be "yes": bug fixes, layout changes, refactoring, new screen, filesystem/database changes, *etc*. ## Experiment An experiment is a feature flag where you care about analytical value of the flag, and how it might impact user's behaviour. A feature flag with analytics. They are also usually medium-lived, being relevant as long as the new code is being developed. The most common rule is A/B test. On the **backend**, an experiment rely on an analytical environment that will pick the A/B test groups and distributions, which means those can't be held in memory easily. That also means that you'll need a fallback value in case fetching the group for a given customer fails. On the **frontend** and on **mobile** they are no different from feature flags. ## Operational toggle An operational toggle is like a system-level manual circuit breaker, where you turn on/off a feature, fail over the load to a different server, *etc*. They are useful switches to have during an incident. They are usually long-lived, being relevant as long as the code is in production. The most common rule is percentages. They can be feature flags that are promoted to operational toggles on the **backend**, or may be purposefully put in place preventively or after a postmortem analysis. On the **frontend** and on **mobile** they are similar to feature flags, where the "feature" is being turned on and off, and the client interprets this value to show if the "feature" is available or unavailable. ## Best practices ### Prefer dynamic content Even though feature flags give you more dynamicity, they're still somewhat manual: you have to create one for a specific feature and change it by hand. If you find yourself manually updating a feature flags every other day, or tweaking the percentages frequently, consider making it fully dynamic. Try using a dataset that is generated automatically, or computing the content on the fly. Say you have a configuration screen with a list of options and sub-options, and you're trying to find how to better structure this list. Instead of using a feature flag for switching between 3 and 5 options, make it fully dynamic. This way you'll be able to perform other tests that you didn't plan, and get more flexibility out of it. ### Use the client version to negotiate feature flags After effectively finishing a feature, the old code that coexisted with the new one will be deleted, and all traces of the transition will vanish from the code base. However if you just remove the feature flags from the API, all of the old versions of clients that relied on that value to show the new feature will go downgrade to the old feature. This means that you should avoid deleting client-facing feature flags, and retire them instead: use the client version to decide when the feature is stable, and return `true` for every client with a version greater or equal to that. This way you can stop thinking about the feature flag, and you don't break or downgrade clients that didn't upgrade past the transition. ### Beware of many nested feature flags Nested flags combine exponentially. Pick strategic entry points or transitions eligible for feature flags, and beware of their nesting. ### Include feature flags in the development workflow Add feature flags to the list of things to think about during whiteboarding, and deleting/retiring a feature flags at the end of the development. ### Always rely on a feature flag on the app Again, there is no such thing "too small for a feature flag". Too many feature flags is a good problem to have, not the opposite.