diff options
Diffstat (limited to '_articles/2020-11-14-local-first-software-you-own-your-data-in-spite-of-the-cloud-article-review.md')
-rw-r--r-- | _articles/2020-11-14-local-first-software-you-own-your-data-in-spite-of-the-cloud-article-review.md | 306 |
1 files changed, 0 insertions, 306 deletions
diff --git a/_articles/2020-11-14-local-first-software-you-own-your-data-in-spite-of-the-cloud-article-review.md b/_articles/2020-11-14-local-first-software-you-own-your-data-in-spite-of-the-cloud-article-review.md deleted file mode 100644 index 68ae03c..0000000 --- a/_articles/2020-11-14-local-first-software-you-own-your-data-in-spite-of-the-cloud-article-review.md +++ /dev/null @@ -1,306 +0,0 @@ ---- - -title: "Local-First Software: You Own Your Data, in spite of the Cloud - article review" - -date: 2020-11-14 - -layout: post - -lang: en - -ref: local-first-software-you-own-your-data-in-spite-of-the-cloud-article-review - -eu_categories: presentation,article review - ---- - -*This article is derived from a [presentation][presentation] given at a Papers -We Love meetup on the same subject.* - -This is a review of the article -"[Local-First Software: You Own Your Data, in spite of the Cloud][article-pdf]", -by M. Kleppmann, A. Wiggins, P. Van Hardenberg and M. F. McGranaghan. - -### Offline-first, local-first - -The "local-first" term they use isn't new, and I have used it myself in the past -to refer to this types of application, where the data lives primarily on the -client, and there are conflict resolution algorithms that reconcile data created -on different instances. - -Sometimes I see confusion with this idea and "client-side", "offline-friendly", -"syncable", etc. I have myself used this terms, also. - -There exists, however, already the "offline-first" term, which conveys almost -all of that meaning. In my view, "local-first" doesn't extend "offline-first" in -any aspect, rather it gives a well-defined meaning to it instead. I could say -that "local-first" is just "offline-first", but with 7 well-defined ideals -instead of community best practices. - -It is a step forward, and given the number of times I've seen the paper shared -around I think there's a chance people will prefer saying "local-first" in -*lieu* of "offline-first" from now on. - -[presentation]: {% link _slides/2020-11-14-on-local-first-beyond-the-crdt-silver-bullet.slides %} -[article-pdf]: https://martin.kleppmann.com/papers/local-first.pdf - -### Software licenses - -On a footnote of the 7th ideal ("You Retain Ultimate Ownership and Control"), -the authors say: - -> In our opinion, maintaining control and ownership of data does not mean that -> the software must necessarily be open source. (...) as long as it does not -> artificially restrict what users can do with their files. - -They give examples of artificial restrictions, like this artificial restriction -I've come up with: - -```bash -#!/bin/sh - -TODAY=$(date +%s) -LICENSE_EXPIRATION=$(date -d 2020-11-15 +%s) - -if [ $TODAY -ge $LICENSE_EXPIRATION ]; then - echo 'License expired!' - exit 1 -fi - -echo $((2 + 2)) -``` - -Now when using this very useful program: - -```bash -# today -$ ./useful-adder.sh -4 -# tomorrow -$ ./useful-adder.sh -License expired! -``` - -This is obviously an intentional restriction, and it goes against the 5th ideal -("The Long Now"). This software would only be useful as long as the embedded -license expiration allowed. Sure you could change the clock on the computer, but -there are many other ways that this type of intentional restriction is in -conflict with that ideal. - -However, what about unintentional restrictions? What if a software had an equal -or similar restriction, and stopped working after days pass? Or what if the -programmer added a constant to make the development simpler, and this led to -unintentionally restricting the user? - -```bash -# today -$ useful-program -# ...useful output... - -# tomorrow, with more data -$ useful-program -ERROR: Panic! Stack overflow! -``` - -Just as easily as I can come up with ways to intentionally restrict users, I can -do the same for unintentionally restrictions. A program can stop working for a -variety of reasons. - -If it stops working due do, say, data growth, what are the options? Reverting to -an earlier backup, and making it read-only? That isn't really a "Long Now", but -rather a "Long Now as long as the software keeps working as expected". - -The point is: if the software isn't free, "The Long Now" isn't achievable -without a lot of wishful thinking. Maybe the authors were trying to be more -friendly towards business who don't like free software, but in doing so they've proposed -a contradiction by reconciling "The Long Now" with proprietary software. - -It isn't the same as saying that any free software achieves that ideal, -either. The license can still be free, but the source code can become -unavailable due to cloud rot. Or maybe the build is undocumented, or the build -tools had specific configuration that one has to guess. A piece of free -software can still fail to achieve "The Long Now". Being free doesn't guarantee -it, just makes it possible. - -A colleague has challenged my view, arguing that the software doesn't really -need to be free, as long as there is an specification of the file format. This -way if the software stops working, the format can still be processed by other -programs. But this doesn't apply in practice: if you have a document that you -write to, and software stops working, you still want to write to the document. -An external tool that navigates the content and shows it to you won't allow you -to keep writing, and when it does that tool is now starting to re-implement the -software. - -An open specification could serve as a blueprint to other implementations, -making the data format more friendly to reverse-engineering. But the -re-implementation still has to exist, at which point the original software failed -to achieve "The Long Now". - -It is less bad, but still not quite there yet. - -### Denial of existing solutions - -When describing "Existing Data Storage and Sharing Models", on a -footnote[^devil] the authors say: - -[^devil]: This is the second aspect that I'm picking on the article from a - footnote. I guess the devil really is on the details. - -> In principle it is possible to collaborate without a repository service, -> e.g. by sending patch files by email, but the majority of Git users rely -> on GitHub. - -The authors go to a great length to talk about usability of cloud apps, and even -point to research they've done on it, but they've missed learning more from -local-first solutions that already exist. - -Say the automerge CRDT proves to be even more useful than what everybody -imagined. Say someone builds a local-first repository service using it. How will -it change anything of the Git/GitHub model? What is different about it that -prevents people in the future writing a paper saying: - -> In principle it is possible to collaborate without a repository service, -> e.g. by using automerge and platform X, -> but the majority of Git users rely on GitHub. - -How is this any better? - -If it is already [possible][git-local-first] to have a local-first development -workflow, why don't people use it? Is it just fashion, or there's a fundamental -problem with it? If so, what is it, and how to avoid it? - -If sending patches by emails is perfectly possible but out of fashion, why even -talk about Git/GitHub? Isn't this a problem that people are putting themselves -in? How can CRDTs possibly prevent people from doing that? - -My impression is that the authors envision a better future, where development is -fully decentralized unlike today, and somehow CRDTs will make that happen. If -more people think this way, "CRDT" is next in line to the buzzword list that -solves everything, like "containers", "blockchain" or "machine learning". - -Rather than picturing an imaginary service that could be described like -"GitHub+CRDTs" and people would adopt it, I'd rather better understand why -people don't do it already, since Git is built to work like that. - -[git-local-first]: https://drewdevault.com/2018/07/23/Git-is-already-distributed.html - -### Ditching of web applications - -The authors put web application in a worse position for building local-first -application, claiming that: - -> (...) the architecture of web apps remains fundamentally server-centric. -> Offline support is an afterthought in most web apps, and the result is -> accordingly fragile. - -Well, I disagree. - -The problem isn't inherit to the web platform, but instead how people use it. - -I have myself built offline-first applications, leveraging IndexedDB, App Cache, -*etc*. I wanted to build an offline-first application on the web, and so I did. - -In fact, many people choose [PouchDB][pouchdb] *because* of that, since it is a -good tool for offline-first web applications. The problem isn't really the -technology, but how much people want their application to be local-first. - -Contrast it with Android [Instant Apps][instant-apps], where applications are -sent to the phone in small parts. Since this requires an internet connection to -move from a part of the app bundle to another, a subset of the app isn't -local-first, despite being an app. - -The point isn't the technology, but how people are using it. Local-first web -applications are perfectly possible, just like non-local-first native -applications are possible. - -[pouchdb]: https://pouchdb.com/ -[instant-apps]: https://developer.android.com/topic/google-play-instant - -### Costs are underrated - -I think the costs of "old-fashioned apps" over "cloud apps" are underrated, -mainly regarding storage, and that this costs can vary a lot by application. - -Say a person writes online articles for their personal website, and puts -everything into Git. Since there isn't supposed to be any collaboration, all -of the relevant ideals of local-first are achieved. - -Now another person creates videos instead of articles. They could try keeping -everything local, but after some time the storage usage fills the entire disk. -This person's local-first setup would be much more complex, and would cost much -more on maintenance, backup and storage. - -Even though both have similar needs, a local-first video repository is much more -demanding. So the local-first thinking here isn't "just keep everything local", -but "how much time and money am I willing to spend to keep everything local". - -The convenience of "cloud apps" becomes so attractive that many don't even have -a local copy of their videos, and rely exclusively on service providers to -maintain, backup and store their content. - -The dial measuring "cloud apps" and "old-fashioned apps" needs to be specific to -use-cases. - -### Real-time collaboration is optional - -If I were the one making the list of ideals, I wouldn't focus so much on -real-time collaboration. - -Even though seamless collaboration is desired, it being real-time depends on the -network being available for that. But ideal 3 states that -"The Network is Optional", so real-time collaboration is also optional. - -The fundamentals of a local-first system should enable real-time collaboration -when network is available, but shouldn't focus on it. - -On many places when discussing applications being offline, it is common for me -to find people saying that their application works -"even on a plane, subway or elevator". That is a reflection of when said -developers have to deal with networks being unavailable. - -But this leaves out a big chunk of the world where internet connection is -intermittent, or only works every other day or only once a week, or stops -working when it rains, *etc*. For this audience, living without network -connectivity isn't such a discrete moment in time, but part of every day life. I -like the fact that the authors acknowledge that. - -When discussing "working offline", I'd rather keep this type of person in mind, -then the subset of people who are offline when on the elevator will naturally be -included. - -### On CRDTs and developer experience - -When discussing developer experience, the authors bring up some questions to be -answered further, like: - -> For an app developer, how does the use of a CRDT-based data layer compare to -> existing storage layers like a SQL database, a filesystem, or CoreData? Is a -> distributed system harder to write software for? - -That is an easy one: yes. - -A distributed system *is* harder to write software for, being a distributed -system. - -Adding a large layer of data structures and algorithms will make it more complex -to write software for, naturally. And if trying to make this layer transparent -to the programmer, so they can pretend that layer doesn't exist is a bad idea, -as RPC frameworks have tried, and failed. - -See "[A Note on Distributed Computing][note-dist-comp]" for a critique on RPC -frameworks trying to make the network invisible, which I think also applies in -equivalence for making the CRDTs layer invisible. - -[rmi-wiki]: https://en.wikipedia.org/wiki/Java_remote_method_invocation -[note-dist-comp]: https://web.archive.org/web/20130116163535/http://labs.oracle.com/techrep/1994/smli_tr-94-29.pdf - -## Conclusion - -I liked a lot the article, as it took the "offline-first" philosophy and ran -with it. - -But I think the authors' view of adding CRDTs and things becoming local-first is -a bit too magical. - -This particular area is one that I have large interest on, and I wish to see -more being done on the "local-first" space. |