From 020c1e77489b772f854bb3288b9c8d2818a6bf9d Mon Sep 17 00:00:00 2001
From: EuAndreh
] + This makes me consider it as a storage backend all by itself. I initially + considered having an SQLite storage backend as one implementation of the POSIX + filesystem storage API that I mentioned. My goal was to rely on it so I could + validate the correctness of the actual implementation, given SQLite's + robustness. +pass:[
] + However it may even better to just use SQLite, and get an ACID backend without + recreating a big part of SQLite from scratch. In fact, both Datomic and + PouchDB didn't create an storage backend for themselves, they just plugged on + what already existed and already worked. I'm beginning to think that it would + be wiser to just do the same, and drop entirely the from scratch + implementation that I mentioned. +pass:[
] + That's not to say that adding an IndexedDB compatibility layer to SQLite would + be enough to make it fit the other requirements I mention on this page. SQLite + still is an implementation of a update-in-place, SQL, table-oriented database. + It is probably true that cherry-picking the relevant parts of SQLite (like + storage access, consistency, crash recovery, parser generator, *etc.*) and + leaving out the unwanted parts (SQL, tables, threading, *etc.*) would be + better than including the full SQLite stack, that's simply an optimization. + Both could even coexist, if desired. +pass:[
] + SQLite would have to be treated similarly to how Datomic treats SQL databases: + instead of having a table for each entities, spread attributes over the + tables, *etc.*, it treats SQL databases as a key-value storage so it doesn't + have to re-implement interacting with the disk that other databases do well. +pass:[
] + The tables would contain blocks of binary data, so there isn't a difference on + how the SQLite storage backend behaves and how the IndexedDB storage backend + behaves, much like how Datomic works the same regardless of the storage + backend, same for PouchDB. +pass:[
] + I welcome corrections on what I said above, too. +]. + +{pouchdb}[*PouchDB*] is another great example: it's a full reimplementation of +{couchdb}[CouchDB] that targets JavaScript environments, mainly the browser and +Node.js. However I want a tool that can be deployed anywhere, and not limit its +applications to places that already have a JavaScript runtime environment, or +force the developer to bundle a JavaScript runtime environment with their +application. This is true for GTK+ applications, command line programs, Android +apps, _etc._ + +{mentat}[*Mentat*] was an interesting project, but its reliance on SQLite makes +it inherit most of the downsides (and benefits too) of SQLite itself. + +Having such a requirement imposes a different approach to storage: we have to +decouple the knowledge about the intricacies of storage from the usage of +storage itself, so that a module (say query processing) can access storage +through an API without needing to know about its implementation. This allows +the database to target a POSIX filesystems storage API and an IndexedDB storage +API, and make the rest of the code agnostic about storage. PouchDB has such +mechanism (called {pouchdb-adapters}[adapters]) and Datomic has them too (called +{datomic-storage-services}[storage services]). + +This would allow the database to adapt to where it is embedded: when targeting +the browser the IndexedDB storage API would provide the persistence layer that +the database requires, and similarly the POSIX filesystem storage API would +provide the persistence layer when targeting POSIX systems (like desktops, +mobile, _etc._). + +But there's also an extra restriction that comes from by being embedded: it +needs to provide and embeddable artifact, most likely a binary library object +that exposes a C compatible FFI, similar to {sqlite-amalgamation}[how SQLite +does]. Bundling a full runtime environment is possible, but doesn't make it a +compelling solution for embedding. This rules out most languages, and leaves +us with C, Rust, Zig, and similar options that can target POSIX systems and +WebAssembly. + +=== Immutable + +:datomic: https://www.datomic.com/ +:day-of-datomic: https://vimeo.com/116315075 +:git: https://git-scm.com/ +:sqlite-limits: https://sqlite.org/limits.html +:datomic-no-history: https://docs.datomic.com/cloud/best.html#nohistory-for-high-churn + +Being immutable means that only new information is added, no in-place update +ever happens, and nothing is ever deleted. + +Having an immutable database presents us with similar trade-offs found in +persistent data structures, like lack of coordination when doing reads, caches +being always coherent, and more usage of space. + +{datomic}[*Datomic*] is the go to database example of this: it will only add +information (datoms) and allows you to query them in a multitude of ways. +Stuart Halloway calls it "accumulate-only" over +"append-only"{empty}footnote:accumulate-only[ + Video "{day-of-datomic}[Day of Datomic Part 2]" on Datomic's information + model, at time 12:28. +]: + +____ +It's accumulate-only, it is not append-only. So append-only, most people when +they say that they're implying something physical about what happens. +____ + +Also a database can be append-only and overwrite existing information with new +information, by doing clean-ups of "stale" data. I prefer to adopt the +"accumulate-only" naming and approach. + +{git}[*Git*] is another example of this: new commits are always added on top of +the previous data, and it grows by adding commits instead of replacing existing +ones. + +Git repositories can only grow in size, and that is not only an acceptable +condition, but also one of the reasons to use it. + +All this means that no in-place updates happens on data, and the database will +be much more concerned about how compact and efficiently it stores data than how +fast it does writes to disk. Being embedded, the storage limitation is either +a) how much storage the device has or b) how much storage was designed for the +application to consume. So even though the database could theoretically operate +with hundreds of TBs, a browser page or mobile application wouldn't have access +to this amount of storage. SQLite even {sqlite-limits}[says] that it does +support approximately 280 TBs of data, but those limits are untested. + +The upside of keeping everything is that you can have historical views of your +data, which is very powerful. This also means that applications should turn +this off when not +relevant{empty}footnote:no-history[ + Similar to {datomic-no-history}[Datomic's `:db/noHistory`]. +]. + +=== Syncable + +:3-way-merge: https://en.wikipedia.org/wiki/Merge_(version_control) +:git-remote-gcrypt: https://spwhitton.name/tech/code/git-remote-gcrypt/ + +This is a frequent topic when talking about offline-first solutions. When +building applications that: + +* can fully work offline, +* stores data, +* propagates that data to other application instances, + +then you'll need a conflict resolution strategy to handle all the situations +where different application instances disagree. Those application instances +could be a desktop and a browser version of the same application, or the same +mobile app in different devices. + +A three-way merge seems to be the best approach, on top of which you could add +application specific conflict resolution functions, like: + +* pick the change with higher timestamp; +* if one change is a delete, pick it; +* present the diff on the screen and allow the user to merge them. + +Some databases try to make this "easy", by choosing a strategy for you, but I've +found that different applications require different conflict resolution +strategies. Instead, the database should leave this up to the user to decide, +and provide tools for them to do it. + +{3-way-merge}[*Three-way merges in version control*] are the best example, +performing automatic merges when possible and asking the user to resolve +conflicts when they appear. + +The unit of conflict for a version control system is a line of text. The +database equivalent would probably be a single attribute, not a full entity or a +full row. + +Making all the conflict resolution logic be local should allow the database to +have encrypted remotes similar to how {git-remote-gcrypt}[git-remote-gcrypt] +adds this functionality to Git. This would enable users to sync the application +data across devices using an untrusted intermediary. + +=== Relational + +:datomic-datalog: https://docs.datomic.com/on-prem/query.html +:datomic-model: https://docs.datomic.com/cloud/whatis/data-model.html#datoms + +I want the power of relational queries on the client applications. + +Most of the arguments against traditional table-oriented relational databases +are related to write performance, but those don't apply here. The bottlenecks +for client applications usually aren't write throughput. Nobody is interested +in differentiating between 1 MB/s or 10 MB/s when you're limited to 500 MB +total. + +The relational model of the database could either be based on SQL and tables +like in SQLite, or maybe {datomic-datalog}[datalog] and {datomic-model}[datoms] +like in Datomic. + +== From aspects to values + +Now let's try to translate the aspects above into values, as suggested by Bryan +Cantrill. + +=== Portability + +Being able to target so many different platforms is a bold goal, and the +embedded nature of the database demands portability to be a core value. + +=== Integrity + +When the local database becomes the source of truth of the application, it must +provide consistency guarantees that enables applications to rely on it. + +=== Expressiveness + +The database should empower applications to slice and dice the data in any way +it wants to. + +== Next steps + +Since I can't find any database that fits these requirements, I've finally come +to terms with doing it myself. + +It's probably going to take me a few years to do it, and making it portable +between POSIX and IndexedDB will probably be the biggest challenge. I got +myself a few books on databases to start. + +I wonder if I'll ever be able to get this done. + +== External links + +:reddit: https://old.reddit.com/r/programming/comments/ijwz5b/the_database_i_wish_i_had/ +:lobsters: https://lobste.rs/s/m9vkg4/database_i_wish_i_had +:hn: https://news.ycombinator.com/item?id=24337244 +:list: https://lists.sr.ht/~euandreh/public-inbox/%3C010101744a592b75-1dce9281-f0b8-4226-9d50-fd2c7901fa72-000000%40us-west-2.amazonses.com%3E + +See discussions on {reddit}[Reddit], {lobsters}[lobsters], {hn}[HN] and {list}[a +lengthy email exchange]. -- cgit v1.2.3