From 020c1e77489b772f854bb3288b9c8d2818a6bf9d Mon Sep 17 00:00:00 2001 From: EuAndreh Date: Fri, 18 Apr 2025 02:17:12 -0300 Subject: git mv src/content/* src/content/en/ --- .../en/blog/2020/11/14/local-first-review.adoc | 305 +++++++++++++++++++++ 1 file changed, 305 insertions(+) create mode 100644 src/content/en/blog/2020/11/14/local-first-review.adoc (limited to 'src/content/en/blog/2020/11/14') diff --git a/src/content/en/blog/2020/11/14/local-first-review.adoc b/src/content/en/blog/2020/11/14/local-first-review.adoc new file mode 100644 index 0000000..f9dd4b0 --- /dev/null +++ b/src/content/en/blog/2020/11/14/local-first-review.adoc @@ -0,0 +1,305 @@ += Local-First Software: article review +:categories: presentation article-review + +:empty: +:presentation: link:../../../../slides/2020/11/14/local-first.html FIXME +:reviewed-article: https://martin.kleppmann.com/papers/local-first.pdf + +_This article is derived from a {presentation}[presentation] given at a Papers +We Love meetup on the same subject._ + +This is a review of the article "{reviewed-article}[Local-First Software: You +Own Your Data, in spite of the Cloud]", by M. Kleppmann, A. Wiggins, P. Van +Hardenberg and M. F. McGranaghan. + +== Offline-first, local-first + +The "local-first" term they use isn't new, and I have used it myself in the past +to refer to this types of application, where the data lives primarily on the +client, and there are conflict resolution algorithms that reconcile data created +on different instances. + +Sometimes I see confusion with this idea and "client-side", "offline-friendly", +"syncable", etc. I have myself used this terms, also. + +There exists, however, already the "offline-first" term, which conveys almost +all of that meaning. In my view, "local-first" doesn't extend "offline-first" +in any aspect, rather it gives a well-defined meaning to it instead. I could +say that "local-first" is just "offline-first", but with 7 well-defined ideals +instead of community best practices. + +It is a step forward, and given the number of times I've seen the paper shared +around I think there's a chance people will prefer saying "local-first" in +_lieu_ of "offline-first" from now on. + +== Software licenses + +On a footnote of the 7th ideal ("You Retain Ultimate Ownership and Control"), +the authors say: + +____ +In our opinion, maintaining control and ownership of data does not mean that the +software must necessarily be open source. (...) as long as it does not +artificially restrict what users can do with their files. +____ + +They give examples of artificial restrictions, like this artificial restriction +I've come up with: + +[source,sh] +---- +#!/bin/sh + +TODAY=$(date +%s) +LICENSE_EXPIRATION=$(date -d 2020-11-15 +%s) + +if [ $TODAY -ge $LICENSE_EXPIRATION ]; then + echo 'License expired!' + exit 1 +fi + +echo $((2 + 2)) +---- + +Now when using this very useful program: + +[source,sh] +---- +# today +$ ./useful-adder.sh +4 +# tomorrow +$ ./useful-adder.sh +License expired! +---- + +This is obviously an intentional restriction, and it goes against the 5th ideal +("The Long Now"). This software would only be useful as long as the embedded +license expiration allowed. Sure you could change the clock on the computer, +but there are many other ways that this type of intentional restriction is in +conflict with that ideal. + +However, what about unintentional restrictions? What if a software had an equal +or similar restriction, and stopped working after days pass? Or what if the +programmer added a constant to make the development simpler, and this led to +unintentionally restricting the user? + +[source,sh] +---- +# today +$ useful-program +# ...useful output... + +# tomorrow, with more data +$ useful-program +ERROR: Panic! Stack overflow! +---- + +Just as easily as I can come up with ways to intentionally restrict users, I can +do the same for unintentionally restrictions. A program can stop working for a +variety of reasons. + +If it stops working due do, say, data growth, what are the options? Reverting +to an earlier backup, and making it read-only? That isn't really a "Long Now", +but rather a "Long Now as long as the software keeps working as expected". + +The point is: if the software isn't free, "The Long Now" isn't achievable +without a lot of wishful thinking. Maybe the authors were trying to be more +friendly towards business who don't like free software, but in doing so they've +proposed a contradiction by reconciling "The Long Now" with proprietary +software. + +It isn't the same as saying that any free software achieves that ideal, either. +The license can still be free, but the source code can become unavailable due to +cloud rot. Or maybe the build is undocumented, or the build tools had specific +configuration that one has to guess. A piece of free software can still fail to +achieve "The Long Now". Being free doesn't guarantee it, just makes it +possible. + +A colleague has challenged my view, arguing that the software doesn't really +need to be free, as long as there is an specification of the file format. This +way if the software stops working, the format can still be processed by other +programs. But this doesn't apply in practice: if you have a document that you +write to, and software stops working, you still want to write to the document. +An external tool that navigates the content and shows it to you won't allow you +to keep writing, and when it does that tool is now starting to re-implement the +software. + +An open specification could serve as a blueprint to other implementations, +making the data format more friendly to reverse-engineering. But the +re-implementation still has to exist, at which point the original software +failed to achieve "The Long Now". + +It is less bad, but still not quite there yet. + +== Denial of existing solutions + +:distgit: https://drewdevault.com/2018/07/23/Git-is-already-distributed.html + +When describing "Existing Data Storage and Sharing Models", on a +footnote{empty}footnote:devil[ + This is the second aspect that I'm picking on the article from a footnote. I + guess the devil really is on the details. +] the authors say: + +____ +In principle it is possible to collaborate without a repository service, e.g. by +sending patch files by email, but the majority of Git users rely on GitHub. +____ + +The authors go to a great length to talk about usability of cloud apps, and even +point to research they've done on it, but they've missed learning more from +local-first solutions that already exist. + +Say the automerge CRDT proves to be even more useful than what everybody +imagined. Say someone builds a local-first repository service using it. How +will it change anything of the Git/GitHub model? What is different about it +that prevents people in the future writing a paper saying: + +____ +In principle it is possible to collaborate without a repository service, e.g. by +using automerge and platform X, but the majority of Git users rely on GitHub. +____ + +How is this any better? + +If it is already {distgit}[possible] to have a local-first development workflow, +why don't people use it? Is it just fashion, or there's a fundamental problem +with it? If so, what is it, and how to avoid it? + +If sending patches by emails is perfectly possible but out of fashion, why even +talk about Git/GitHub? Isn't this a problem that people are putting themselves +in? How can CRDTs possibly prevent people from doing that? + +My impression is that the authors envision a better future, where development is +fully decentralized unlike today, and somehow CRDTs will make that happen. If +more people think this way, "CRDT" is next in line to the buzzword list that +solves everything, like "containers", "blockchain" or "machine learning". + +Rather than picturing an imaginary service that could be described like +"GitHub+CRDTs" and people would adopt it, I'd rather better understand why +people don't do it already, since Git is built to work like that. + +== Ditching of web applications + +:pouchdb: https://pouchdb.com/ +:instant-apps: https://developer.android.com/topic/google-play-instant + +The authors put web application in a worse position for building local-first +application, claiming that: + +____ +(...) the architecture of web apps remains fundamentally server-centric. +Offline support is an afterthought in most web apps, and the result is +accordingly fragile. +____ + +Well, I disagree. + +The problem isn't inherit to the web platform, but instead how people use it. + +I have myself built offline-first applications, leveraging IndexedDB, App Cache, +_etc_. I wanted to build an offline-first application on the web, and so I did. + +In fact, many people choose {pouchdb}[PouchDB] _because_ of that, since it is a +good tool for offline-first web applications. The problem isn't really the +technology, but how much people want their application to be local-first. + +Contrast it with Android {instant-apps}[Instant Apps], where applications are +sent to the phone in small parts. Since this requires an internet connection to +move from a part of the app bundle to another, a subset of the app isn't +local-first, despite being an app. + +The point isn't the technology, but how people are using it. Local-first web +applications are perfectly possible, just like non-local-first native +applications are possible. + +== Costs are underrated + +I think the costs of "old-fashioned apps" over "cloud apps" are underrated, +mainly regarding storage, and that this costs can vary a lot by application. + +Say a person writes online articles for their personal website, and puts +everything into Git. Since there isn't supposed to be any collaboration, all of +the relevant ideals of local-first are achieved. + +Now another person creates videos instead of articles. They could try keeping +everything local, but after some time the storage usage fills the entire disk. +This person's local-first setup would be much more complex, and would cost much +more on maintenance, backup and storage. + +Even though both have similar needs, a local-first video repository is much more +demanding. So the local-first thinking here isn't "just keep everything local", +but "how much time and money am I willing to spend to keep everything local". + +The convenience of "cloud apps" becomes so attractive that many don't even have +a local copy of their videos, and rely exclusively on service providers to +maintain, backup and store their content. + +The dial measuring "cloud apps" and "old-fashioned apps" needs to be specific to +use-cases. + +== Real-time collaboration is optional + +If I were the one making the list of ideals, I wouldn't focus so much on +real-time collaboration. + +Even though seamless collaboration is desired, it being real-time depends on the +network being available for that. But ideal 3 states that "The Network is +Optional", so real-time collaboration is also optional. + +The fundamentals of a local-first system should enable real-time collaboration +when network is available, but shouldn't focus on it. + +On many places when discussing applications being offline, it is common for me +to find people saying that their application works "even on a plane, subway or +elevator". That is a reflection of when said developers have to deal with +networks being unavailable. + +But this leaves out a big chunk of the world where internet connection is +intermittent, or only works every other day or only once a week, or stops +working when it rains, _etc_. For this audience, living without network +connectivity isn't such a discrete moment in time, but part of every day life. +I like the fact that the authors acknowledge that. + +When discussing "working offline", I'd rather keep this type of person in mind, +then the subset of people who are offline when on the elevator will naturally be +included. + +== On CRDTs and developer experience + +:archived-article: https://web.archive.org/web/20130116163535/https://labs.oracle.com/techrep/1994/smli_tr-94-29.pdf + +When discussing developer experience, the authors bring up some questions to be +answered further, like: + +____ +For an app developer, how does the use of a CRDT-based data layer compare to +existing storage layers like a SQL database, a filesystem, or CoreData? Is a +distributed system harder to write software for? +____ + +That is an easy one: yes. + +A distributed system _is_ harder to write software for, being a distributed +system. + +Adding a large layer of data structures and algorithms will make it more complex +to write software for, naturally. And if trying to make this layer transparent +to the programmer, so they can pretend that layer doesn't exist is a bad idea, +as RPC frameworks have tried, and failed. + +See "{archived-article}[A Note on Distributed Computing]" for a critique on RPC +frameworks trying to make the network invisible, which I think also applies in +equivalence for making the CRDTs layer invisible. + +== Conclusion + +I liked a lot the article, as it took the "offline-first" philosophy and ran +with it. + +But I think the authors' view of adding CRDTs and things becoming local-first is +a bit too magical. + +This particular area is one that I have large interest on, and I wish to see +more being done on the "local-first" space. -- cgit v1.2.3