aboutsummaryrefslogtreecommitdiff
path: root/_articles/2020-11-14-local-first-software-you-own-your-data-in-spite-of-the-cloud-article-review.md
blob: 68ae03c32a3ba1d9e5caaadb1d9e99379eaa186c (plain) (blame)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
---

title: "Local-First Software: You Own Your Data, in spite of the Cloud - article review"

date: 2020-11-14

layout: post

lang: en

ref: local-first-software-you-own-your-data-in-spite-of-the-cloud-article-review

eu_categories: presentation,article review

---

*This article is derived from a [presentation][presentation] given at a Papers
We Love meetup on the same subject.*

This is a review of the article
"[Local-First Software: You Own Your Data, in spite of the Cloud][article-pdf]",
by M. Kleppmann, A. Wiggins, P. Van Hardenberg and M. F. McGranaghan.

### Offline-first, local-first

The "local-first" term they use isn't new, and I have used it myself in the past
to refer to this types of application, where the data lives primarily on the
client, and there are conflict resolution algorithms that reconcile data created
on different instances.

Sometimes I see confusion with this idea and "client-side", "offline-friendly",
"syncable", etc. I have myself used this terms, also.

There exists, however, already the "offline-first" term, which conveys almost
all of that meaning. In my view, "local-first" doesn't extend "offline-first" in
any aspect, rather it gives a well-defined meaning to it instead. I could say
that "local-first" is just "offline-first", but with 7 well-defined ideals
instead of community best practices.

It is a step forward, and given the number of times I've seen the paper shared
around I think there's a chance people will prefer saying "local-first" in
*lieu* of "offline-first" from now on.

[presentation]: {% link _slides/2020-11-14-on-local-first-beyond-the-crdt-silver-bullet.slides %}
[article-pdf]: https://martin.kleppmann.com/papers/local-first.pdf

### Software licenses

On a footnote of the 7th ideal ("You Retain Ultimate Ownership and Control"),
the authors say:

> In our opinion, maintaining control and ownership of data does not mean that
> the software must necessarily be open source. (...) as long as it does not
> artificially restrict what users can do with their files.

They give examples of artificial restrictions, like this artificial restriction
I've come up with:

```bash
#!/bin/sh

TODAY=$(date +%s)
LICENSE_EXPIRATION=$(date -d 2020-11-15 +%s)

if [ $TODAY -ge $LICENSE_EXPIRATION ]; then
  echo 'License expired!'
  exit 1
fi

echo $((2 + 2))
```

Now when using this very useful program:

```bash
# today
$ ./useful-adder.sh
4
# tomorrow
$ ./useful-adder.sh
License expired!
```

This is obviously an intentional restriction, and it goes against the 5th ideal
("The Long Now"). This software would only be useful as long as the embedded
license expiration allowed. Sure you could change the clock on the computer, but
there are many other ways that this type of intentional restriction is in
conflict with that ideal.

However, what about unintentional restrictions? What if a software had an equal
or similar restriction, and stopped working after days pass? Or what if the
programmer added a constant to make the development simpler, and this led to
unintentionally restricting the user?

```bash
# today
$ useful-program
# ...useful output...

# tomorrow, with more data
$ useful-program
ERROR: Panic! Stack overflow!
```

Just as easily as I can come up with ways to intentionally restrict users, I can
do the same for unintentionally restrictions. A program can stop working for a
variety of reasons.

If it stops working due do, say, data growth, what are the options? Reverting to
an earlier backup, and making it read-only? That isn't really a "Long Now", but
rather a "Long Now as long as the software keeps working as expected".

The point is: if the software isn't free, "The Long Now" isn't achievable
without a lot of wishful thinking. Maybe the authors were trying to be more
friendly towards business who don't like free software, but in doing so they've proposed
a contradiction by reconciling "The Long Now" with proprietary software.

It isn't the same as saying that any free software achieves that ideal,
either. The license can still be free, but the source code can become
unavailable due to cloud rot. Or maybe the build is undocumented, or the build
tools had specific configuration that one has to guess. A piece of free
software can still fail to achieve "The Long Now". Being free doesn't guarantee
it, just makes it possible.

A colleague has challenged my view, arguing that the software doesn't really
need to be free, as long as there is an specification of the file format. This
way if the software stops working, the format can still be processed by other
programs. But this doesn't apply in practice: if you have a document that you
write to, and software stops working, you still want to write to the document.
An external tool that navigates the content and shows it to you won't allow you
to keep writing, and when it does that tool is now starting to re-implement the
software.

An open specification could serve as a blueprint to other implementations,
making the data format more friendly to reverse-engineering. But the
re-implementation still has to exist, at which point the original software failed
to achieve "The Long Now".

It is less bad, but still not quite there yet.

### Denial of existing solutions

When describing "Existing Data Storage and Sharing Models", on a
footnote[^devil] the authors say:

[^devil]: This is the second aspect that I'm picking on the article from a
    footnote. I guess the devil really is on the details.

> In principle it is possible to collaborate without a repository service,
> e.g. by sending patch files by email, but the majority of Git users rely
> on GitHub.

The authors go to a great length to talk about usability of cloud apps, and even
point to research they've done on it, but they've missed learning more from
local-first solutions that already exist.

Say the automerge CRDT proves to be even more useful than what everybody
imagined. Say someone builds a local-first repository service using it. How will
it change anything of the Git/GitHub model? What is different about it that
prevents people in the future writing a paper saying:

> In principle it is possible to collaborate without a repository service,
> e.g. by using automerge and platform X,
> but the majority of Git users rely on GitHub.

How is this any better?

If it is already [possible][git-local-first] to have a local-first development
workflow, why don't people use it? Is it just fashion, or there's a fundamental
problem with it? If so, what is it, and how to avoid it?

If sending patches by emails is perfectly possible but out of fashion, why even
talk about Git/GitHub? Isn't this a problem that people are putting themselves
in? How can CRDTs possibly prevent people from doing that?

My impression is that the authors envision a better future, where development is
fully decentralized unlike today, and somehow CRDTs will make that happen. If
more people think this way, "CRDT" is next in line to the buzzword list that
solves everything, like "containers", "blockchain" or "machine learning".

Rather than picturing an imaginary service that could be described like
"GitHub+CRDTs" and people would adopt it, I'd rather better understand why
people don't do it already, since Git is built to work like that.

[git-local-first]: https://drewdevault.com/2018/07/23/Git-is-already-distributed.html

### Ditching of web applications

The authors put web application in a worse position for building local-first
application, claiming that:

> (...) the architecture of web apps remains fundamentally server-centric.
> Offline support is an afterthought in most web apps, and the result is
> accordingly fragile.

Well, I disagree.

The problem isn't inherit to the web platform, but instead how people use it.

I have myself built offline-first applications, leveraging IndexedDB, App Cache,
*etc*. I wanted to build an offline-first application on the web, and so I did.

In fact, many people choose [PouchDB][pouchdb] *because* of that, since it is a
good tool for offline-first web applications. The problem isn't really the
technology, but how much people want their application to be local-first.

Contrast it with Android [Instant Apps][instant-apps], where applications are
sent to the phone in small parts. Since this requires an internet connection to
move from a part of the app bundle to another, a subset of the app isn't
local-first, despite being an app.

The point isn't the technology, but how people are using it. Local-first web
applications are perfectly possible, just like non-local-first native
applications are possible.

[pouchdb]: https://pouchdb.com/
[instant-apps]: https://developer.android.com/topic/google-play-instant

### Costs are underrated

I think the costs of "old-fashioned apps" over "cloud apps" are underrated,
mainly regarding storage, and that this costs can vary a lot by application.

Say a person writes online articles for their personal website, and puts
everything into Git. Since there isn't supposed to be any collaboration, all
of the relevant ideals of local-first are achieved.

Now another person creates videos instead of articles. They could try keeping
everything local, but after some time the storage usage fills the entire disk.
This person's local-first setup would be much more complex, and would cost much
more on maintenance, backup and storage.

Even though both have similar needs, a local-first video repository is much more
demanding. So the local-first thinking here isn't "just keep everything local",
but "how much time and money am I willing to spend to keep everything local".

The convenience of "cloud apps" becomes so attractive that many don't even have
a local copy of their videos, and rely exclusively on service providers to
maintain, backup and store their content.

The dial measuring "cloud apps" and "old-fashioned apps" needs to be specific to
use-cases.

### Real-time collaboration is optional

If I were the one making the list of ideals, I wouldn't focus so much on
real-time collaboration.

Even though seamless collaboration is desired, it being real-time depends on the
network being available for that. But ideal 3 states that
"The Network is Optional", so real-time collaboration is also optional.

The fundamentals of a local-first system should enable real-time collaboration
when network is available, but shouldn't focus on it.

On many places when discussing applications being offline, it is common for me
to find people saying that their application works
"even on a plane, subway or elevator". That is a reflection of when said
developers have to deal with networks being unavailable.

But this leaves out a big chunk of the world where internet connection is
intermittent, or only works every other day or only once a week, or stops
working when it rains, *etc*. For this audience, living without network
connectivity isn't such a discrete moment in time, but part of every day life. I
like the fact that the authors acknowledge that.

When discussing "working offline", I'd rather keep this type of person in mind,
then the subset of people who are offline when on the elevator will naturally be
included.

### On CRDTs and developer experience

When discussing developer experience, the authors bring up some questions to be
answered further, like:

> For an app developer, how does the use of a CRDT-based data layer compare to
> existing storage layers like a SQL database, a filesystem, or CoreData? Is a
> distributed system harder to write software for?

That is an easy one: yes.

A distributed system *is* harder to write software for, being a distributed
system.

Adding a large layer of data structures and algorithms will make it more complex
to write software for, naturally. And if trying to make this layer transparent
to the programmer, so they can pretend that layer doesn't exist is a bad idea,
as RPC frameworks have tried, and failed.

See "[A Note on Distributed Computing][note-dist-comp]" for a critique on RPC
frameworks trying to make the network invisible, which I think also applies in
equivalence for making the CRDTs layer invisible.

[rmi-wiki]: https://en.wikipedia.org/wiki/Java_remote_method_invocation
[note-dist-comp]: https://web.archive.org/web/20130116163535/http://labs.oracle.com/techrep/1994/smli_tr-94-29.pdf

## Conclusion

I liked a lot the article, as it took the "offline-first" philosophy and ran
with it.

But I think the authors' view of adding CRDTs and things becoming local-first is
a bit too magical.

This particular area is one that I have large interest on, and I wish to see
more being done on the "local-first" space.