1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
|
---
title: "Feature flags: differences between backend, frontend and mobile"
date: 2020-10-19
layout: post
lang: en
ref: feature-flags-differences-between-backend-frontend-and-mobile
category: presentation
---
*This article is derived from a [presentation][presentation] on the same
subject.*
When talking about [feature flags][feature-flags-article], I find that their
costs and benefits are often well exposed and addressed. However the weight of those
costs and benefits apply differently on backend, frontend or mobile, and those
differences aren't covered.
I'll try to make this distinction clear, with some final best practices I've
acquired when using them in production.
[presentation]: {% link _slides/2020-10-19-rollout-feature-flag-experiment-operational-toggle.slides %}
[feature-flags-article]: https://martinfowler.com/articles/feature-toggles.html
## Why feature flags
Feature flags in general tend to be cited on the context of
[continuous deployment][cd]:
> A: With continuous deployment, you deploy to production automatically
> B: But how do I handle deployment failures, partial features, *etc.*?
> A: With techniques like canary, monitoring and alarms, feature flags, *etc.*
Even though adopting continuous deployment doesn't force you to use feature
flags, it creates a demand for it. The inverse is also true: using feature flags
on the code points you more obviously to continuous deployment.
But you should consider feature flags solely by taking into account this
distilled trade-off analysis:
> Am I willing to pay with code complexity to get dynamicity?
It is true that you can make the management of feature flags as
straightforward as possible, but having no feature flags is simpler than having
any. What you get in return is the ability to parameterize the behaviour of the
application at runtime, without doing any code changes.
Sometimes this added complexity may tilt the balance towards not using a feature
flag, and sometimes the flexibility of changing behaviour at runtime is
absolutely worth the added complexity. This can vary a lot by code base, feature, but
fundamentally by environment: its much cheaper to deploy a new version of a
service than to release a new version of an app.
[cd]: https://www.atlassian.com/continuous-delivery/principles/continuous-integration-vs-delivery-vs-deployment
## Control over the environment
The key differentiator that makes the trade-offs apply differently is how much
control you have over the environment.
When running a **backend** service, you usually are paying for the servers
themselves, and can tweak them as you wish. This means you have full control do
to code changes as you wish. Not only that, you decide when to do it, and for
how long the transition will last.
On the **frontend** you have less control: even though you can choose to make a
new version available any time you wish, you can't force[^force] clients to
immediately switch to the new version. That means that a) clients could skip
upgrades at any time and b) you always have to keep backward and forward
compatibility in mind.
Even though I'm mentioning frontend directly, it applies to other environment
with similar characteristics: desktop applications, command-line programs,
*etc*.
On **mobile** you have even less control: app stores need to allow your app to
be updated, which could bite you when least desired. Theoretically you could
make you APK available on third party stores like [F-Droid][f-droid], or even
make the APK itself available for direct download, which would give you the same
characteristics of a frontend application, but that happens less often.
On iOS you can't even do that. You have to get Apple's blessing on every single
update. Even though we already know that is a [bad idea][apple] for over a
decade now, there isn't a way around it. This is where you have the least
control.
In practice, the amount of control you have will change how much you value
dynamicity: the less control you have, the more valuable it is. In other words,
having a dynamic flag on the backend may or may not be worth it since you could
always update the code immediately after, but on iOS it is basically always
worth it.
[f-droid]: https://f-droid.org/
[^force]: Technically you could force a reload with JavaScript using
`window.location.reload()`, but that not only is invasive and impolite, but
also gives you the illusion that you have control over the client when you
actually don't: clients with disabled JavaScript would be immune to such
tactics.
[apple]: http://www.paulgraham.com/apple.html
## Rollout
A rollout is used to *roll out* a new version of software.
They are usually short-lived, being relevant as long as the new code is being
deployed. The most common rule is percentages.
On the **backend**, it is common to find it on the deployment infrastructure
itself, like canary servers, blue/green deployments,
[a kubernetes deployment rollout][k8s], *etc*. You could do those manually, by
having a dynamic control on the code itself, but rollbacks are cheap enough that
people usually do a normal deployment and just give some extra attention to the
metrics dashboard.
On the **frontend**, CDN propagation delays and people not refreshing their web
pages are rollouts by themselves. You could do this by geographical region or
something similar, if desired.
On **mobile**, the Play Store allows you to perform
fine-grained [staged rollouts][staged-rollouts], and the App Store allows you to
perform limited [phased releases][phased-releases].
[k8s]: https://kubernetes.io/docs/concepts/workloads/controllers/deployment/#creating-a-deployment
[staged-rollouts]: https://support.google.com/googleplay/android-developer/answer/6346149?hl=en
[phased-releases]: https://help.apple.com/app-store-connect/#/dev3d65fcee1
## Feature flag
A feature flag is a *flag* that tells the application on runtime to turn on or
off a given *feature*. That means that the actual production code will have more
than one possible code paths to go through, and that a new version of a feature
coexists with the old version. The feature flag tells which part of the code to
go through.
They are usually medium-lived, being relevant as long as the new code is being
developed. The most common rules are percentages, allow/deny lists, A/B groups
and client version.
On the **backend**, those are useful for things that have a long development
cycle, or that needs to done by steps. Consider loading the feature flag rules
in memory when the application starts, so that you avoid querying a database
or an external service for applying a feature flag rule and avoid flakiness on
the result due to intermittent network failures.
Since on the **frontend** you don't control when to update the client software,
you're left with applying the feature flag rule on the server, and exposing the
value through an API for maximum dynamicity. This could be in the frontend code
itself, and fallback to a "just refresh the page"/"just update to the latest
version" strategy for less dynamic scenarios.
On **mobile** you can't even rely on a "just update to the latest version"
strategy, since the code for the app could be updated to a new feature and be
blocked on the store. Those cases aren't recurrent, but you should always assume
the store will deny updates on critical moments so you don't find yourself with
no cards to play. That means the only control you actually have is via
the backend, by parameterizing the runtime of the application using the API. In
practice, you should always have a feature flag to control any relevant piece of
code. There is no such thing as "too small code change for a feature flag". What
you should ask yourself is:
> If the code I'm writing breaks and stays broken for around a month, do I care?
If you're doing an experimental screen, or something that will have a very small
impact you might answer "no" to the above question. For everything else, the
answer will be "yes": bug fixes, layout changes, refactoring, new screen,
filesystem/database changes, *etc*.
## Experiment
An experiment is a feature flag where you care about analytical value of the
flag, and how it might impact user's behaviour. A feature flag with analytics.
They are also usually medium-lived, being relevant as long as the new code is
being developed. The most common rule is A/B test.
On the **backend**, an experiment rely on an analytical environment that will
pick the A/B test groups and distributions, which means those can't be held in
memory easily. That also means that you'll need a fallback value in case
fetching the group for a given customer fails.
On the **frontend** and on **mobile** they are no different from feature flags.
## Operational toggle
An operational toggle is like a system-level manual circuit breaker, where you
turn on/off a feature, fail over the load to a different server, *etc*. They are
useful switches to have during an incident.
They are usually long-lived, being relevant as long as the code is in
production. The most common rule is percentages.
They can be feature flags that are promoted to operational toggles on the
**backend**, or may be purposefully put in place preventively or after a
postmortem analysis.
On the **frontend** and on **mobile** they are similar to feature flags, where
the "feature" is being turned on and off, and the client interprets this value
to show if the "feature" is available or unavailable.
## Best practices
### Prefer dynamic content
Even though feature flags give you more dynamicity, they're still somewhat
manual: you have to create one for a specific feature and change it by hand.
If you find yourself manually updating a feature flags every other day, or
tweaking the percentages frequently, consider making it fully dynamic. Try
using a dataset that is generated automatically, or computing the content on the
fly.
Say you have a configuration screen with a list of options and sub-options, and
you're trying to find how to better structure this list. Instead of using a
feature flag for switching between 3 and 5 options, make it fully dynamic. This
way you'll be able to perform other tests that you didn't plan, and get more
flexibility out of it.
### Use the client version to negotiate feature flags
After effectively finishing a feature, the old code that coexisted with the new
one will be deleted, and all traces of the transition will vanish from the code
base. However if you just remove the feature flags from the API, all of the old
versions of clients that relied on that value to show the new feature will go
downgrade to the old feature.
This means that you should avoid deleting client-facing feature flags, and
retire them instead: use the client version to decide when the feature is
stable, and return `true` for every client with a version greater or equal to
that. This way you can stop thinking about the feature flag, and you don't break
or downgrade clients that didn't upgrade past the transition.
### Beware of many nested feature flags
Nested flags combine exponentially.
Pick strategic entry points or transitions eligible for feature flags, and
beware of their nesting.
### Include feature flags in the development workflow
Add feature flags to the list of things to think about during whiteboarding, and
deleting/retiring a feature flags at the end of the development.
### Always rely on a feature flag on the app
Again, there is no such thing "too small for a feature flag". Too many feature
flags is a good problem to have, not the opposite. Automate the process of
creating a feature flag to lower its cost.
|