1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
|
#
msgid ""
msgstr ""
msgid ""
"I watched the talk \"[Platform as a Reflection of Values: Joyent, Node.js "
"and beyond](https://vimeo.com/230142234)\" by Bryan Cantrill, and I think he"
" was able to put into words something I already felt for some time: if "
"there's no piece of software out there that reflects your values, it's time "
"for you to build that software[^talk-time]."
msgstr ""
msgid ""
"[^talk-time]: At the very end, at time 29:49. When talking about the draft "
"of this article with a friend, he noted that Bryan O'Sullivan (a different "
"Bryan) says a similar thing on his talk \"[Running a startup on "
"Haskell](https://www.youtube.com/watch?v=ZR3Jirqk6W8)\", at time 4:15."
msgstr ""
msgid ""
"I kind of agree with what he said, because this is already happening to me. "
"I long for a database with a certain set of values, and for a few years I "
"was just waiting for someone to finally write it. After watching his talk, "
"Bryan is saying to me: \"time to stop waiting, and start writing it "
"yourself\"."
msgstr ""
msgid ""
"So let me try to give an overview of such database, and go over its values."
msgstr ""
msgid "Overview"
msgstr ""
msgid ""
"I want a database that allows me to create decentralized client-side "
"applications that can sync data."
msgstr ""
msgid "The best one-line description I can give right now is:"
msgstr ""
msgid "It's sort of like PouchDB, Git, Datomic, SQLite and Mentat."
msgstr ""
msgid "A more descriptive version could be:"
msgstr ""
msgid "An embedded, immutable, syncable relational database."
msgstr ""
msgid "Let's go over what I mean by each of those aspects one by one."
msgstr ""
msgid "Embedded"
msgstr ""
msgid ""
"I think the server-side database landscape is diverse and mature enough for "
"my needs (even though I end up choosing SQLite most of the time), and what "
"I'm after is a database to be embedded on client-side applications itself, "
"be it desktop, browser, mobile, *etc.*"
msgstr ""
msgid ""
"The purpose of such database is not to keep some local cache of data in case"
" of lost connectivity: we have good solutions for that already. It should "
"serve as the source of truth, and allow the application to work on top of "
"it."
msgstr ""
msgid ""
"[**SQLite**](https://sqlite.org/index.html) is a great example of that: it "
"is a very powerful relational database that runs [almost "
"anywhere](https://sqlite.org/whentouse.html). What I miss from it that "
"SQLite doesn't provide is the ability to run it on the browser: even though "
"you could compile it to WebAssembly, ~~it assumes a POSIX filesystem that "
"would have to be emulated~~[^posix-sqlite]."
msgstr ""
msgid ""
"[^posix-sqlite]: It was [pointed out to "
"me](https://news.ycombinator.com/item?id=24338881) that SQLite doesn't "
"assume the existence of a POSIX filesystem, as I wrongly stated. Thanks for "
"the correction."
msgstr ""
msgid ""
"[**PouchDB**](https://pouchdb.com/) is another great example: it's a full "
"reimplementation of [CouchDB](https://couchdb.apache.org/) that targets "
"JavaScript environments, mainly the browser and Node.js. However I want a "
"tool that can be deployed anywhere, and not limit its applications to places"
" that already have a JavaScript runtime environment, or force the developer "
"to bundle a JavaScript runtime environment with their application. This is "
"true for GTK+ applications, command line programs, Android apps, *etc.*"
msgstr ""
msgid ""
"[**Mentat**](https://github.com/mozilla/mentat) was an interesting project, "
"but its reliance on SQLite makes it inherit most of the downsides (and "
"benefits too) of SQLite itself."
msgstr ""
msgid ""
"Having such a requirement imposes a different approach to storage: we have "
"to decouple the knowledge about the intricacies of storage from the usage of"
" storage itself, so that a module (say query processing) can access storage "
"through an API without needing to know about its implementation. This allows"
" the database to target a POSIX filesystems storage API and an IndexedDB "
"storage API, and make the rest of the code agnostic about storage. PouchDB "
"has such mechanism (called [adapters](https://pouchdb.com/adapters.html)) "
"and Datomic has them too (called [storage "
"services](https://docs.datomic.com/on-prem/storage.html))."
msgstr ""
msgid ""
"This would allow the database to adapt to where it is embedded: when "
"targeting the browser the IndexedDB storage API would provide the "
"persistence layer that the database requires, and similarly the POSIX "
"filesystem storage API would provide the persistence layer when targeting "
"POSIX systems (like desktops, mobile, *etc.*)."
msgstr ""
msgid ""
"But there's also an extra restriction that comes from by being embedded: it "
"needs to provide and embeddable artifact, most likely a binary library "
"object that exposes a C compatible FFI, similar to [how SQLite "
"does](https://www.sqlite.org/amalgamation.html). Bundling a full runtime "
"environment is possible, but doesn't make it a compelling solution for "
"embedding. This rules out most languages, and leaves us with C, Rust, Zig, "
"and similar options that can target POSIX systems and WebAssembly."
msgstr ""
msgid "Immutable"
msgstr ""
msgid ""
"Being immutable means that only new information is added, no in-place update"
" ever happens, and nothing is ever deleted."
msgstr ""
msgid ""
"Having an immutable database presents us with similar trade-offs found in "
"persistent data structures, like lack of coordination when doing reads, "
"caches being always coherent, and more usage of space."
msgstr ""
msgid ""
"[**Datomic**](https://www.datomic.com/) is the go to database example of "
"this: it will only add information (datoms) and allows you to query them in "
"a multitude of ways. Stuart Halloway calls it \"accumulate-only\" over "
"\"append-only\"[^accumulate-only](Video \"[Day of Datomic Part "
"2](https://vimeo.com/116315075)\"):"
msgstr ""
msgid ""
"It's accumulate-only, it is not append-only. So append-only, most people "
"when they say that they're implying something physical about what happens."
msgstr ""
msgid "on Datomic's information model, at time 12:28."
msgstr ""
msgid ""
"Also a database can be append-only and overwrite existing information with "
"new information, by doing clean-ups of \"stale\" data. I prefer to adopt the"
" \"accumulate-only\" naming and approach."
msgstr ""
msgid ""
"[**Git**](https://git-scm.com/) is another example of this: new commits are "
"always added on top of the previous data, and it grows by adding commits "
"instead of replacing existing ones."
msgstr ""
msgid ""
"Git repositories can only grow in size, and that is not only an acceptable "
"condition, but also one of the reasons to use it."
msgstr ""
msgid ""
"All this means that no in-place updates happens on data, and the database "
"will be much more concerned about how compact and efficiently it stores data"
" than how fast it does writes to disk. Being embedded, the storage "
"limitation is either a) how much storage the device has or b) how much "
"storage was designed for the application to consume. So even though the "
"database could theoretically operate with hundreds of TBs, a browser page or"
" mobile application wouldn't have access to this amount of storage. SQLite "
"even [says](https://sqlite.org/limits.html) that it does support "
"approximately 280 TBs of data, but those limits are untested."
msgstr ""
msgid ""
"The upside of keeping everything is that you can have historical views of "
"your data, which is very powerful. This also means that applications should "
"turn this off when not relevant[^no-history]."
msgstr ""
msgid ""
"[^no-history]: Similar to [Datomic's "
"`:db/noHistory`](https://docs.datomic.com/cloud/best.html#nohistory-for-"
"high-churn)."
msgstr ""
msgid "Syncable"
msgstr ""
msgid ""
"This is a frequent topic when talking about offline-first solutions. When "
"building applications that:"
msgstr ""
msgid "can fully work offline,"
msgstr ""
msgid "stores data,"
msgstr ""
msgid "propagates that data to other application instances,"
msgstr ""
msgid ""
"then you'll need a conflict resolution strategy to handle all the situations"
" where different application instances disagree. Those application instances"
" could be a desktop and a browser version of the same application, or the "
"same mobile app in different devices."
msgstr ""
msgid ""
"A three-way merge seems to be the best approach, on top of which you could "
"add application specific conflict resolution functions, like:"
msgstr ""
msgid "pick the change with higher timestamp;"
msgstr ""
msgid "if one change is a delete, pick it;"
msgstr ""
msgid "present the diff on the screen and allow the user to merge them."
msgstr ""
msgid ""
"Some databases try to make this \"easy\", by choosing a strategy for you, "
"but I've found that different applications require different conflict "
"resolution strategies. Instead, the database should leave this up to the "
"user to decide, and provide tools for them to do it."
msgstr ""
msgid ""
"[**Three-way merges in version "
"control**](https://en.wikipedia.org/wiki/Merge_(version_control)) are the "
"best example, performing automatic merges when possible and asking the user "
"to resolve conflicts when they appear."
msgstr ""
msgid ""
"The unit of conflict for a version control system is a line of text. The "
"database equivalent would probably be a single attribute, not a full entity "
"or a full row."
msgstr ""
msgid ""
"Making all the conflict resolution logic be local should allow the database "
"to have encrypted remotes similar to how [git-remote-"
"gcrypt](https://spwhitton.name/tech/code/git-remote-gcrypt/) adds this "
"functionality to Git. This would enable users to sync the application data "
"across devices using an untrusted intermediary."
msgstr ""
msgid "Relational"
msgstr ""
msgid "I want the power of relational queries on the client applications."
msgstr ""
msgid ""
"Most of the arguments against traditional table-oriented relational "
"databases are related to write performance, but those don't apply here. The "
"bottlenecks for client applications usually aren't write throughput. Nobody "
"is interested in differentiating between 1 MB/s or 10 MB/s when you're "
"limited to 500 MB total."
msgstr ""
msgid ""
"The relational model of the database could either be based on SQL and tables"
" like in SQLite, or maybe [datalog](https://docs.datomic.com/on-"
"prem/query.html) and [datoms](https://docs.datomic.com/cloud/whatis/data-"
"model.html#datoms) like in Datomic."
msgstr ""
msgid "From aspects to values"
msgstr ""
msgid ""
"Now let's try to translate the aspects above into values, as suggested by "
"Bryan Cantrill."
msgstr ""
msgid "Portability"
msgstr ""
msgid ""
"Being able to target so many different platforms is a bold goal, and the "
"embedded nature of the database demands portability to be a core value."
msgstr ""
msgid "Integrity"
msgstr ""
msgid ""
"When the local database becomes the source of truth of the application, it "
"must provide consistency guarantees that enables applications to rely on it."
msgstr ""
msgid "Expressiveness"
msgstr ""
msgid ""
"The database should empower applications to slice and dice the data in any "
"way it wants to."
msgstr ""
msgid "Next steps"
msgstr ""
msgid ""
"Since I can't find any database that fits these requirements, I've finally "
"come to terms with doing it myself."
msgstr ""
msgid ""
"It's probably going to take me a few years to do it, and making it portable "
"between POSIX and IndexedDB will probably be the biggest challenge. I got "
"myself a few books on databases to start."
msgstr ""
msgid "I wonder if I'll ever be able to get this done."
msgstr ""
msgid "External links"
msgstr ""
msgid ""
"See discussions on "
"[Reddit](https://www.reddit.com/r/programming/comments/ijwz5b/the_database_i_wish_i_had/),"
" [lobsters](https://lobste.rs/s/m9vkg4/database_i_wish_i_had), "
"[HN](https://news.ycombinator.com/item?id=24337244) and [a lengthy email "
"exchange](https://lists.sr.ht/~euandreh/public-"
"inbox/%3C010101744a592b75-1dce9281-f0b8-4226-9d50-fd2c7901fa72-000000%40us-"
"west-2.amazonses.com%3E)."
msgstr ""
msgid ""
"This makes me consider it as a storage backend all by itself. I\n"
"initially considered having an SQLite storage backend as one implementation\n"
"of the POSIX filesystem storage API that I mentioned. My goal was to rely on\n"
"it so I could validate the correctness of the actual implementation, given\n"
"SQLite's robustness.\n"
"\n"
"However it may even better to just use SQLite, and get an ACID backend\n"
"without recreating a big part of SQLite from scratch. In fact, both Datomic\n"
"and PouchDB didn't create an storage backend for themselves, they just\n"
"plugged on what already existed and already worked. I'm beginning to think\n"
"that it would be wiser to just do the same, and drop entirely the from\n"
"scratch implementation that I mentioned.\n"
"\n"
"That's not to say that adding an IndexedDB compatibility layer to SQLite\n"
"would be enough to make it fit the other requirements I mention on this\n"
"page. SQLite still is an implementation of a update-in-place, SQL,\n"
"table-oriented database. It is probably true that cherry-picking the\n"
"relevant parts of SQLite (like storage access, consistency, crash recovery,\n"
"parser generator, *etc.*) and leaving out the unwanted parts (SQL, tables,\n"
"threading, *etc.*) would be better than including the full SQLite stack, but\n"
"that's simply an optimization. Both could even coexist, if desired.\n"
"\n"
"SQLite would have to be treated similarly to how Datomic treats SQL\n"
"databases: instead of having a table for each entities, spread attributes\n"
"over the tables, *etc.*, it treats SQL databases as a key-value storage so it\n"
"doesn't have to re-implement interacting with the disk that other databases\n"
"do well.\n"
"\n"
"The tables would contain blocks of binary data, so there isn't a difference\n"
"on how the SQLite storage backend behaves and how the IndexedDB storage\n"
"backend behaves, much like how Datomic works the same regardless of the\n"
"storage backend, same for PouchDB.\n"
"\n"
"I welcome corrections on what I said above, too.\n"
msgstr ""
msgid ""
"title: The database I wish I had\n"
"date: 2020-08-31\n"
"updated_at: 2020-09-03\n"
"layout: post\n"
"lang: en\n"
"ref: the-database-i-wish-i-had\n"
"eu_categories: mediator"
msgstr ""
#~ msgid ""
#~ "title: The database I wish I had\n"
#~ "date: 2020-08-31\n"
#~ "updated_at: 2020-09-03\n"
#~ "layout: post\n"
#~ "lang: en\n"
#~ "ref: the-database-i-wish-i-had\n"
#~ "category: mediator"
#~ msgstr ""
|