aboutsummaryrefslogtreecommitdiff
path: root/src/content/blog/2020/08
diff options
context:
space:
mode:
Diffstat (limited to 'src/content/blog/2020/08')
-rw-r--r--src/content/blog/2020/08/10/guix-srht.adoc128
-rw-r--r--src/content/blog/2020/08/31/database-i-with-i-had.adoc295
2 files changed, 423 insertions, 0 deletions
diff --git a/src/content/blog/2020/08/10/guix-srht.adoc b/src/content/blog/2020/08/10/guix-srht.adoc
new file mode 100644
index 0000000..4d7e8d5
--- /dev/null
+++ b/src/content/blog/2020/08/10/guix-srht.adoc
@@ -0,0 +1,128 @@
+---
+title: Guix inside sourcehut builds.sr.ht CI
+date: 2020-08-10
+updated_at: 2020-08-19
+layout: post
+lang: en
+ref: guix-inside-sourcehut-builds-sr-ht-ci
+---
+After the release of the [NixOS images in builds.sr.ht][0] and much
+usage of it, I also started looking at [Guix][1] and
+wondered if I could get it on the awesome builds.sr.ht service.
+
+[0]: https://man.sr.ht/builds.sr.ht/compatibility.md#nixos
+[1]: https://guix.gnu.org/
+
+The Guix manual section on the [binary installation][2] is very thorough, and
+even a [shell installer script][3] is provided, but it is built towards someone
+installing Guix on their personal computer, and relies heavily on interactive
+input.
+
+[2]: https://guix.gnu.org/manual/en/guix.html#Binary-Installation
+[3]: https://git.savannah.gnu.org/cgit/guix.git/plain/etc/guix-install.sh
+
+I developed the following set of scripts that I have been using for some time to
+run Guix tasks inside builds.sr.ht jobs. First, `install-guix.sh`:
+
+```shell
+#!/usr/bin/env bash
+set -x
+set -Eeuo pipefail
+
+VERSION='1.0.1'
+SYSTEM='x86_64-linux'
+BINARY="guix-binary-${VERSION}.${SYSTEM}.tar.xz"
+
+cd /tmp
+wget "https://ftp.gnu.org/gnu/guix/${BINARY}"
+tar -xf "${BINARY}"
+
+sudo mv var/guix /var/
+sudo mv gnu /
+sudo mkdir -p ~root/.config/guix
+sudo ln -fs /var/guix/profiles/per-user/root/current-guix ~root/.config/guix/current
+
+GUIX_PROFILE="$(echo ~root)/.config/guix/current"
+source "${GUIX_PROFILE}/etc/profile"
+
+groupadd --system guixbuild
+for i in $(seq -w 1 10);
+do
+ useradd -g guixbuild \
+ -G guixbuild \
+ -d /var/empty \
+ -s "$(command -v nologin)" \
+ -c "Guix build user ${i}" --system \
+ "guixbuilder${i}";
+done
+
+mkdir -p /usr/local/bin
+cd /usr/local/bin
+ln -s /var/guix/profiles/per-user/root/current-guix/bin/guix .
+ln -s /var/guix/profiles/per-user/root/current-guix/bin/guix-daemon .
+
+guix archive --authorize < ~root/.config/guix/current/share/guix/ci.guix.gnu.org.pub
+```
+
+Almost all of it is taken directly from the [binary installation][2] section
+from the manual, with the interactive bits stripped out: after downloading and
+extracting the Guix tarball, we create some symlinks, add guixbuild users and
+authorize the `ci.guix.gnu.org.pub` signing key.
+
+After installing Guix, we perform a `guix pull` to update Guix inside `start-guix.sh`:
+```shell
+#!/usr/bin/env bash
+set -x
+set -Eeuo pipefail
+
+sudo guix-daemon --build-users-group=guixbuild &
+guix pull
+guix package -u
+guix --version
+```
+
+Then we can put it all together in a sample `.build.yml` configuration file I'm
+using myself:
+
+```yaml
+image: debian/stable
+packages:
+ - wget
+sources:
+ - https://git.sr.ht/~euandreh/songbooks
+tasks:
+ - install-guix: |
+ cd ./songbooks/
+ ./scripts/install-guix.sh
+ ./scripts/start-guix.sh
+ echo 'sudo guix-daemon --build-users-group=guixbuild &' >> ~/.buildenv
+ echo 'export PATH="${HOME}/.config/guix/current/bin${PATH:+:}$PATH"' >> ~/.buildenv
+ - tests: |
+ cd ./songbooks/
+ guix environment -m build-aux/guix.scm -- make check
+ - docs: |
+ cd ./songbooks/
+ guix environment -m build-aux/guix.scm -- make publish-dist
+```
+
+We have to add the `guix-daemon` to `~/.buildenv` so it can be started on every
+following task run. Also, since we used `wget` inside `install-guix.sh`, we had
+to add it to the images package list.
+
+After the `install-guix` task, you can use Guix to build and test your project,
+or run any `guix environment --ad-hoc my-package -- my script` :)
+
+## Improvements
+
+When I originally created this code I had a reason why to have both a `sudo`
+call for `sudo ./scripts/install-guix.sh` and `sudo` usages inside
+`install-guix.sh` itself. I couldn't figure out why (it feels like my past self
+was a bit smarter 😬), but it feels ugly now. If it is truly required I could
+add an explanation for it, or remove this entirely in favor of a more elegant solution.
+
+I could also contribute the Guix image upstream to builds.sr.ht, but there
+wasn't any build or smoke tests in the original [repository][4], so I wasn't
+inclined to make something that just "works on my machine" or add a maintainence
+burden to the author. I didn't look at it again recently, though.
+
+[4]: https://git.sr.ht/~sircmpwn/builds.sr.ht
diff --git a/src/content/blog/2020/08/31/database-i-with-i-had.adoc b/src/content/blog/2020/08/31/database-i-with-i-had.adoc
new file mode 100644
index 0000000..7d127c1
--- /dev/null
+++ b/src/content/blog/2020/08/31/database-i-with-i-had.adoc
@@ -0,0 +1,295 @@
+---
+title: The database I wish I had
+date: 2020-08-31
+updated_at: 2020-09-03
+layout: post
+lang: en
+ref: the-database-i-wish-i-had
+eu_categories: mediator
+---
+
+I watched the talk
+"[Platform as a Reflection of Values: Joyent, Node.js and beyond][platform-values]"
+by Bryan Cantrill, and I think he was able to put into words something I already
+felt for some time: if there's no piece of software out there that reflects your
+values, it's time for you to build that software[^talk-time].
+
+[platform-values]: https://vimeo.com/230142234
+[^talk-time]: At the very end, at time 29:49. When talking about the draft of
+ this article with a friend, he noted that Bryan O'Sullivan (a different
+ Bryan) says a similar thing on his talk
+ "[Running a startup on Haskell](https://www.youtube.com/watch?v=ZR3Jirqk6W8)",
+ at time 4:15.
+
+I kind of agree with what he said, because this is already happening to me. I
+long for a database with a certain set of values, and for a few years I was just
+waiting for someone to finally write it. After watching his talk, Bryan is
+saying to me: "time to stop waiting, and start writing it yourself".
+
+So let me try to give an overview of such database, and go over its values.
+
+## Overview
+
+I want a database that allows me to create decentralized client-side
+applications that can sync data.
+
+The best one-line description I can give right now is:
+
+> It's sort of like PouchDB, Git, Datomic, SQLite and Mentat.
+
+A more descriptive version could be:
+
+> An embedded, immutable, syncable relational database.
+
+Let's go over what I mean by each of those aspects one by one.
+
+### Embedded
+
+I think the server-side database landscape is diverse and mature enough for
+my needs (even though I end up choosing SQLite most of the time), and what I'm
+after is a database to be embedded on client-side applications itself, be it
+desktop, browser, mobile, *etc.*
+
+The purpose of such database is not to keep some local cache of data in case of
+lost connectivity: we have good solutions for that already. It should serve as
+the source of truth, and allow the application to work on top of it.
+
+[**SQLite**][sqlite] is a great example of that: it is a very powerful
+relational database that runs [almost anywhere][sqlite-whentouse]. What I miss
+from it that SQLite doesn't provide is the ability to run it on the browser:
+even though you could compile it to WebAssembly, ~~it assumes a POSIX filesystem
+that would have to be emulated~~[^posix-sqlite].
+
+[sqlite]: https://sqlite.org/index.html
+[sqlite-whentouse]: https://sqlite.org/whentouse.html
+[^posix-sqlite]: It was [pointed out to me](https://news.ycombinator.com/item?id=24338881)
+ that SQLite doesn't assume the existence of a POSIX filesystem, as I wrongly
+ stated. Thanks for the correction.
+
+ This makes me consider it as a storage backend all by itself. I
+ initially considered having an SQLite storage backend as one implementation
+ of the POSIX filesystem storage API that I mentioned. My goal was to rely on
+ it so I could validate the correctness of the actual implementation, given
+ SQLite's robustness.
+
+ However it may even better to just use SQLite, and get an ACID backend
+ without recreating a big part of SQLite from scratch. In fact, both Datomic
+ and PouchDB didn't create an storage backend for themselves, they just
+ plugged on what already existed and already worked. I'm beginning to think
+ that it would be wiser to just do the same, and drop entirely the from
+ scratch implementation that I mentioned.
+
+ That's not to say that adding an IndexedDB compatibility layer to SQLite
+ would be enough to make it fit the other requirements I mention on this
+ page. SQLite still is an implementation of a update-in-place, SQL,
+ table-oriented database. It is probably true that cherry-picking the
+ relevant parts of SQLite (like storage access, consistency, crash recovery,
+ parser generator, *etc.*) and leaving out the unwanted parts (SQL, tables,
+ threading, *etc.*) would be better than including the full SQLite stack, but
+ that's simply an optimization. Both could even coexist, if desired.
+
+ SQLite would have to be treated similarly to how Datomic treats SQL
+ databases: instead of having a table for each entities, spread attributes
+ over the tables, *etc.*, it treats SQL databases as a key-value storage so it
+ doesn't have to re-implement interacting with the disk that other databases
+ do well.
+
+ The tables would contain blocks of binary data, so there isn't a difference
+ on how the SQLite storage backend behaves and how the IndexedDB storage
+ backend behaves, much like how Datomic works the same regardless of the
+ storage backend, same for PouchDB.
+
+ I welcome corrections on what I said above, too.
+
+[**PouchDB**][pouchdb] is another great example: it's a full reimplementation of
+[CouchDB][couchdb] that targets JavaScript environments, mainly the browser and
+Node.js. However I want a tool that can be deployed anywhere, and not limit its
+applications to places that already have a JavaScript runtime environment, or
+force the developer to bundle a JavaScript runtime environment with their
+application. This is true for GTK+ applications, command line programs, Android
+apps, *etc.*
+
+[pouchdb]: https://pouchdb.com/
+[couchdb]: https://couchdb.apache.org/
+
+[**Mentat**][mentat] was an interesting project, but its reliance on SQLite
+makes it inherit most of the downsides (and benefits too) of SQLite itself.
+
+[mentat]: https://github.com/mozilla/mentat
+
+Having such a requirement imposes a different approach to storage: we have to
+decouple the knowledge about the intricacies of storage from the usage of
+storage itself, so that a module (say query processing) can access storage
+through an API without needing to know about its implementation. This allows
+the database to target a POSIX filesystems storage API and an IndexedDB storage
+API, and make the rest of the code agnostic about storage. PouchDB has such
+mechanism (called [adapters][pouchdb-adapters]) and Datomic has them too (called
+[storage services][datomic-storage-services]).
+
+[pouchdb-adapters]: https://pouchdb.com/adapters.html
+[datomic-storage-services]: https://docs.datomic.com/on-prem/storage.html
+
+This would allow the database to adapt to where it is embedded: when targeting
+the browser the IndexedDB storage API would provide the persistence layer
+that the database requires, and similarly the POSIX filesystem storage API would
+provide the persistence layer when targeting POSIX systems (like desktops,
+mobile, *etc.*).
+
+But there's also an extra restriction that comes from by being embedded: it
+needs to provide and embeddable artifact, most likely a binary library object
+that exposes a C compatible FFI, similar to
+[how SQLite does][sqlite-amalgamation]. Bundling a full runtime environment is
+possible, but doesn't make it a compelling solution for embedding. This rules
+out most languages, and leaves us with C, Rust, Zig, and similar options that
+can target POSIX systems and WebAssembly.
+
+[sqlite-amalgamation]: https://www.sqlite.org/amalgamation.html
+
+### Immutable
+
+Being immutable means that only new information is added, no in-place update
+ever happens, and nothing is ever deleted.
+
+Having an immutable database presents us with similar trade-offs found in
+persistent data structures, like lack of coordination when doing reads, caches
+being always coherent, and more usage of space.
+
+[**Datomic**][datomic] is the go to database example of this: it will only add
+information (datoms) and allows you to query them in a multitude of ways. Stuart
+Halloway calls it "accumulate-only" over "append-only"[^accumulate-only]:
+
+> It's accumulate-only, it is not append-only. So append-only, most people when
+> they say that they're implying something physical about what happens.
+
+[datomic]: https://www.datomic.com/
+[^accumulate-only]: Video "[Day of Datomic Part 2](https://vimeo.com/116315075)"
+ on Datomic's information model, at time 12:28.
+
+Also a database can be append-only and overwrite existing information with new
+information, by doing clean-ups of "stale" data. I prefer to adopt the
+"accumulate-only" naming and approach.
+
+[**Git**][git] is another example of this: new commits are always added on top
+of the previous data, and it grows by adding commits instead of replacing
+existing ones.
+
+[git]: https://git-scm.com/
+
+Git repositories can only grow in size, and that is not only an acceptable
+condition, but also one of the reasons to use it.
+
+All this means that no in-place updates happens on data, and the database will
+be much more concerned about how compact and efficiently it stores data than how
+fast it does writes to disk. Being embedded, the storage limitation is either a)
+how much storage the device has or b) how much storage was designed for the
+application to consume. So even though the database could theoretically operate
+with hundreds of TBs, a browser page or mobile application wouldn't have access
+to this amount of storage. SQLite even [says][sqlite-limits] that it does
+support approximately 280 TBs of data, but those limits are untested.
+
+The upside of keeping everything is that you can have historical views of your
+data, which is very powerful. This also means that applications should turn this
+off when not relevant[^no-history].
+
+[sqlite-limits]: https://sqlite.org/limits.html
+[^no-history]: Similar to
+ [Datomic's `:db/noHistory`](https://docs.datomic.com/cloud/best.html#nohistory-for-high-churn).
+
+### Syncable
+
+This is a frequent topic when talking about offline-first solutions. When
+building applications that:
+
+- can fully work offline,
+- stores data,
+- propagates that data to other application instances,
+
+then you'll need a conflict resolution strategy to handle all the situations
+where different application instances disagree. Those application instances
+could be a desktop and a browser version of the same application, or the same
+mobile app in different devices.
+
+A three-way merge seems to be the best approach, on top of which you could add
+application specific conflict resolution functions, like:
+
+- pick the change with higher timestamp;
+- if one change is a delete, pick it;
+- present the diff on the screen and allow the user to merge them.
+
+Some databases try to make this "easy", by choosing a strategy for you, but I've
+found that different applications require different conflict resolution
+strategies. Instead, the database should leave this up to the user to decide,
+and provide tools for them to do it.
+
+[**Three-way merges in version control**][3-way-merge] are the best example,
+performing automatic merges when possible and asking the user to resolve
+conflicts when they appear.
+
+The unit of conflict for a version control system is a line of text. The
+database equivalent would probably be a single attribute, not a full entity or a
+full row.
+
+Making all the conflict resolution logic be local should allow the database to
+have encrypted remotes similar to how [git-remote-gcrypt][git-remote-gcrypt]
+adds this functionality to Git. This would enable users to sync the application
+data across devices using an untrusted intermediary.
+
+[3-way-merge]: https://en.wikipedia.org/wiki/Merge_(version_control)
+[git-remote-gcrypt]: https://spwhitton.name/tech/code/git-remote-gcrypt/
+
+### Relational
+
+I want the power of relational queries on the client applications.
+
+Most of the arguments against traditional table-oriented relational databases
+are related to write performance, but those don't apply here. The bottlenecks
+for client applications usually aren't write throughput. Nobody is interested in
+differentiating between 1 MB/s or 10 MB/s when you're limited to 500 MB total.
+
+The relational model of the database could either be based on SQL and tables
+like in SQLite, or maybe [datalog][datalog] and [datoms][datoms] like in
+Datomic.
+
+[datalog]: https://docs.datomic.com/on-prem/query.html
+[datoms]: https://docs.datomic.com/cloud/whatis/data-model.html#datoms
+
+## From aspects to values
+
+Now let's try to translate the aspects above into values, as suggested by Bryan
+Cantrill.
+
+### Portability
+
+Being able to target so many different platforms is a bold goal, and the
+embedded nature of the database demands portability to be a core value.
+
+### Integrity
+
+When the local database becomes the source of truth of the application, it must
+provide consistency guarantees that enables applications to rely on it.
+
+### Expressiveness
+
+The database should empower applications to slice and dice the data in any way
+it wants to.
+
+## Next steps
+
+Since I can't find any database that fits these requirements, I've finally come
+to terms with doing it myself.
+
+It's probably going to take me a few years to do it, and making it portable
+between POSIX and IndexedDB will probably be the biggest challenge. I got myself
+a few books on databases to start.
+
+I wonder if I'll ever be able to get this done.
+
+## External links
+
+See discussions on [Reddit][reddit], [lobsters][lobsters], [HN][hn] and
+[a lengthy email exchange][lengthy-email].
+
+[reddit]: https://www.reddit.com/r/programming/comments/ijwz5b/the_database_i_wish_i_had/
+[lobsters]: https://lobste.rs/s/m9vkg4/database_i_wish_i_had
+[hn]: https://news.ycombinator.com/item?id=24337244
+[lengthy-email]: https://lists.sr.ht/~euandreh/public-inbox/%3C010101744a592b75-1dce9281-f0b8-4226-9d50-fd2c7901fa72-000000%40us-west-2.amazonses.com%3E