6 files changed, 1272 insertions, 0 deletions
diff --git a/_articles/2020-11-12-durable-persistent-trees-and-parser-combinators-building-a-database.md b/_articles/2020-11-12-durable-persistent-trees-and-parser-combinators-building-a-database.md
new file mode 100644
index 0000000..d613a28
--- /dev/null
+++ b/_articles/2020-11-12-durable-persistent-trees-and-parser-combinators-building-a-database.md
@@ -0,0 +1,231 @@
+---
+
+title: Durable persistent trees and parser combinators - building a database
+
+date: 2020-11-12
+
+layout: post
+
+lang: en
+
+ref: durable-persistent-trees-and-parser-combinators-building-a-database
+
+category: mediator
+
+---
+
+I've received with certain frequency messages from people wanting to know if
+I've made any progress on the database project
+[I've written about]({% link _articles/2020-08-31-the-database-i-wish-i-had.md %}).
+
+There are a few areas where I've made progress, and here's a public post on it.
+
+## Proof-of-concept: DAG log
+
+The main thing I wanted to validate with a concrete implementation was the
+concept of modeling a DAG on a sequence of datoms.
+
+The notion of a *datom* is a rip-off from Datomic, which models data with time
+aware *facts*, which come from RDF. RDF's fact is a triple of
+subject-predicate-object, and Datomic's datoms add a time component to it:
+subject-predicate-object-time, A.K.A. entity-attribute-value-transaction:
+
+```clojure
+[[person :likes "pizza" 0 true]
+ [person :likes "bread" 1 true]
+ [person :likes "pizza" 1 false]]
+```
+
+The above datoms say:
+- at time 0, `person` like pizza;
+- at time 1, `person` stopped liking pizza, and started to like bread.
+
+Datomic ensures total consistency of this ever growing log by having a single
+writer, the transactor, that will enforce it when writing.
+
+In order to support disconnected clients, I needed a way to allow multiple
+writers, and I chose to do it by making the log not a list, but a
+directed acyclic graph (DAG):
+
+```clojure
+[[person :likes  "pizza" 0 true]
+ [0      :parent null    0 true]
+ [person :likes  "bread" 1 true]
+ [person :likes  "pizza" 1 false]
+ [1      :parent 0       1 true]]
+```
+
+The extra datoms above add more information to build the directionality to the
+log, and instead of a single consistent log, the DAG could have multiple leaves
+that coexist, much like how different Git branches can have different "latest"
+commits.
+
+In order to validate this idea, I started with a Clojure implementation. The
+goal was not to write the actual final code, but to make a proof-of-concept that
+would allow me to test and stretch the idea itself.
+
+This code [already exists][clj-poc], but is yet fairly incomplete:
+
+- the building of the index isn't done yet (with some
+  [commented code][clj-poc-index] on the next step to be implemented)
+- the indexing is extremely inefficient, with [more][clj-poc-o2-0]
+  [than][clj-poc-o2-1] [one][clj-poc-o2-2] occurrence of `O²` functions;
+- no query support yet.
+
+[clj-poc]: https://git.euandreh.xyz/mediator/tree/src/core/clojure/src/mediator.clj?id=db4a727bc24b54b50158827b34502de21dbf8948#n1
+[clj-poc-index]: https://git.euandreh.xyz/mediator/tree/src/core/clojure/src/mediator.clj?id=db4a727bc24b54b50158827b34502de21dbf8948#n295
+[clj-poc-o2-0]: https://git.euandreh.xyz/mediator/tree/src/core/clojure/src/mediator.clj?id=db4a727bc24b54b50158827b34502de21dbf8948#n130
+[clj-poc-o2-1]: https://git.euandreh.xyz/mediator/tree/src/core/clojure/src/mediator.clj?id=db4a727bc24b54b50158827b34502de21dbf8948#n146
+[clj-poc-o2-2]: https://git.euandreh.xyz/mediator/tree/src/core/clojure/src/mediator.clj?id=db4a727bc24b54b50158827b34502de21dbf8948#n253
+
+## Top-down *and* bottom-up
+
+However, as time passed and I started looking at what the final implementation
+would look like, I started to consider keeping the PoC around.
+
+The top-down approach (Clojure PoC) was in fact helping guide me with the
+bottom-up, and I now have "promoted" the Clojure PoC into a "reference
+implementation". It should now be a finished implementation that says what the
+expected behaviour is, and the actual code should match the behaviour.
+
+The good thing about a reference implementation is that it has no performance of
+resources boundary, so if it ends up being 1000x slower and using 500× more
+memory, it should be find. The code can be also 10x or 100x simpler, too.
+
+## Top-down: durable persistent trees
+
+In promoting the PoC into a reference implementation, this top-down approach now
+needs to go beyond doing everything in memory, and the index data structure now
+needs to be disk-based.
+
+Roughly speaking, most storage engines out there are based either on B-Trees or
+LSM Trees, or some variations of those.
+
+But when building an immutable database, update-in-place B-Trees aren't an
+option, as it doesn't accommodate keeping historical views of the tree. LSM Trees
+may seem a better alternative, but duplication on the files with compaction are
+also ways to delete old data which is indeed useful for a historical view.
+
+I think the thing I'm after is a mix of a Copy-on-Write B-Tree, which would keep
+historical versions with the write IO cost amortization of memtables of LSM
+Trees. I don't know of any B-Tree variant out there that resembles this, so I'll
+call it "Flushing Copy-on-Write B-Tree".
+
+I haven't written any code for this yet, so all I have is a high-level view of
+what it will look like:
+
+1. like Copy-on-Write B-Trees, changing a leaf involves creating a new leaf and
+  building a new path from root to the leaf. The upside is that writes a lock
+  free, and no coordination is needed between readers and writers, ever;
+
+1. the downside is that a single leaf update means at least `H` new nodes that
+  will have to be flushed to disk, where `H` is the height of the tree. To avoid
+  that, the writer creates these nodes exclusively on the in-memory memtable, to
+  avoid flushing to disk on every leaf update;
+
+1. a background job will consolidate the memtable data every time it hits X MB,
+  and persist it to disk, amortizing the cost of the Copy-on-Write B-Tree;
+
+1. readers than will have the extra job of getting the latest relevant
+  disk-resident value and merge it with the memtable data.
+
+The key difference to existing Copy-on-Write B-Trees is that the new trees
+are only periodically written to disk, and the intermediate values are kept in
+memory. Since no node is ever updated, the page utilization is maximum as it
+doesn't need to keep space for future inserts and updates.
+
+And the key difference to existing LSM Trees is that no compaction is run:
+intermediate values are still relevant as the database grows. So this leaves out
+tombstones and value duplication done for write performance.
+
+One can delete intermediate index values to reclaim space, but no data is lost
+on the process, only old B-Tree values. And if the database ever comes back to
+that point (like when doing a historical query), the B-Tree will have to be
+rebuilt from a previous value. After all, the database *is* a set of datoms, and
+everything else is just derived data.
+
+Right now I'm still reading about other data structures that storage engines
+use, and I'll start implementing the "Flushing Copy-on-Write B-Tree" as I learn
+more[^learn-more-db] and mature it more.
+
+[^learn-more-db]: If you are interested in learning more about this too, the
+    very best two resources on this subject are Andy Pavlo's
+    "[Intro to Database Systems](https://www.youtube.com/playlist?list=PLSE8ODhjZXjbohkNBWQs_otTrBTrjyohi)"
+    course and Alex Petrov's "[Database Internals](https://www.databass.dev/)" book.
+
+## Bottom-up: parser combinators and FFI
+
+I chose Rust as it has the best WebAssembly tooling support.
+
+My goal is not to build a Rust database, but a database that happens to be in
+Rust. In order to reach client platforms, the primary API is the FFI one.
+
+I'm not very happy with current tools for exposing Rust code via FFI to the
+external world: they either mix C with C++, which I don't want to do, or provide
+no access to the intermediate representation of the FFI, which would be useful
+for generating binding for any language that speaks FFI.
+
+I like better the path that the author of [cbindgen][cbindgen-crate]
+crate [proposes][rust-ffi]: emitting an data representation of the Rust C API
+(the author calls is a `ffi.json` file), and than building transformers from the
+data representation to the target language. This way you could generate a C API
+*and* the node-ffi bindings for JavaScript automatically from the Rust code.
+
+So the first thing to be done before moving on is an FFI exporter that doesn't
+mix C and C++, and generates said `ffi.json`, and than build a few transformers
+that take this `ffi.json` and generate the language bindings, be it C, C++,
+JavaScript, TypeScript, Kotlin, Swift, Dart, *etc*[^ffi-langs].
+
+[^ffi-langs]: Those are, specifically, the languages I'm more interested on. My
+    goal is supporting client applications, and those languages are the most
+    relevant for doing so: C for GTK, C++ for Qt, JavaScript and TypeScript for
+    Node.js and browser, Kotlin for Android and Swing, Swift for iOS, and Dart
+    for Flutter.
+
+I think the best way to get there is by taking the existing code for cbindgen,
+which uses the [syn][syn-crate] crate to parse the Rust code[^rust-syn], and
+adapt it to emit the metadata.
+
+[^rust-syn]: The fact that syn is an external crate to the Rust compiler points
+    to a big warning: procedural macros are not first class in Rust. They are
+    just like Babel plugins in JavaScript land, with the extra shortcoming that
+    there is no specification for the Rust syntax, unlike JavaScript.
+
+    As flawed as this may be, it seems to be generally acceptable and adopted,
+    which works against building a solid ecosystem for Rust.
+
+    The alternative that rust-ffi implements relies on internals of the Rust
+    compiler, which isn't actually worst, just less common and less accepted.
+
+I've started a fork of cbindgen: [x-bindgen][x-bindgen-repo]. Right now it is
+just a copy of cbindgen verbatim, and I plan to remove all C and C++ emitting
+code from it, and add a IR emitting code instead.
+
+When starting working on x-bindgen, I realized I didn't know what to look for in
+a header file, as I haven't written any C code in many years. So as I was
+writing [libedn][libedn-repo], I didn't know how to build a good C API to
+expose. So I tried porting the code to C, and right now I'm working on building
+a *good* C API for a JSON parser using parser combinators:
+[ParsecC][parsecc-repo].
+
+After "finishing" ParsecC I'll have a good notion of what a good C API is, and
+I'll have a better direction towards how to expose code from libedn to other
+languages, and work on x-bindgen then.
+
+What both libedn and ParsecC are missing right now are proper error reporting,
+and property-based testing for libedn.
+
+[cbindgen-crate]: https://github.com/eqrion/cbindgen
+[syn-crate]: https://github.com/dtolnay/syn
+[rust-ffi]: https://blog.eqrion.net/future-directions-for-cbindgen/
+[x-bindgen-repo]: https://git.euandreh.xyz/x-bindgen/
+[libedn-repo]: https://git.euandreh.xyz/libedn/
+[parsecc-repo]: https://git.euandreh.xyz/parsecc/
+
+## Conclusion
+
+I've learned a lot already, and I feel the journey I'm on is worth going
+through.
+
+If any of those topics interest you, message me to discuss more or contribute!
+Patches welcome!
diff --git a/locale/eo/LC_MESSAGES/_articles/2020-11-12-durable-persistent-trees-and-parser-combinators-building-a-database.po b/locale/eo/LC_MESSAGES/_articles/2020-11-12-durable-persistent-trees-and-parser-combinators-building-a-database.po
new file mode 100644
index 0000000..5cb9491
--- /dev/null
+++ b/locale/eo/LC_MESSAGES/_articles/2020-11-12-durable-persistent-trees-and-parser-combinators-building-a-database.po
@@ -0,0 +1,342 @@
+#
+msgid ""
+msgstr ""
+
+msgid ""
+"title: Durable persistent trees and parser combinators - building a database"
+msgstr ""
+
+msgid "date: 2020-11-12"
+msgstr ""
+
+msgid "layout: post"
+msgstr ""
+
+msgid "lang: en"
+msgstr ""
+
+msgid ""
+"ref: durable-persistent-trees-and-parser-combinators-building-a-database"
+msgstr ""
+
+msgid "category: mediator"
+msgstr ""
+
+msgid ""
+"I've received with certain frequency messages from people wanting to know if"
+" I've made any progress on the database project [I've written about]({% link"
+" _articles/2020-08-31-the-database-i-wish-i-had.md %})."
+msgstr ""
+
+msgid ""
+"There are a few areas where I've made progress, and here's a public post on "
+"it."
+msgstr ""
+
+msgid "Proof-of-concept: DAG log"
+msgstr ""
+
+msgid ""
+"The main thing I wanted to validate with a concrete implementation was the "
+"concept of modeling a DAG on a sequence of datoms."
+msgstr ""
+
+msgid ""
+"The notion of a *datom* is a rip-off from Datomic, which models data with "
+"time aware *facts*, which come from RDF. RDF's fact is a triple of subject-"
+"predicate-object, and Datomic's datoms add a time component to it: subject-"
+"predicate-object-time, A.K.A. entity-attribute-value-transaction:"
+msgstr ""
+
+msgid ""
+"[[person :likes \"pizza\" 0 true]\n"
+" [person :likes \"bread\" 1 true]\n"
+" [person :likes \"pizza\" 1 false]]\n"
+msgstr ""
+
+msgid "The above datoms say:"
+msgstr ""
+
+msgid "at time 0, `person` like pizza;"
+msgstr ""
+
+msgid "at time 1, `person` stopped liking pizza, and started to like bread."
+msgstr ""
+
+msgid ""
+"Datomic ensures total consistency of this ever growing log by having a "
+"single writer, the transactor, that will enforce it when writing."
+msgstr ""
+
+msgid ""
+"In order to support disconnected clients, I needed a way to allow multiple "
+"writers, and I chose to do it by making the log not a list, but a directed "
+"acyclic graph (DAG):"
+msgstr ""
+
+msgid ""
+"[[person :likes  \"pizza\" 0 true]\n"
+" [0      :parent null    0 true]\n"
+" [person :likes  \"bread\" 1 true]\n"
+" [person :likes  \"pizza\" 1 false]\n"
+" [1      :parent 0       1 true]]\n"
+msgstr ""
+
+msgid ""
+"The extra datoms above add more information to build the directionality to "
+"the log, and instead of a single consistent log, the DAG could have multiple"
+" leaves that coexist, much like how different Git branches can have "
+"different \"latest\" commits."
+msgstr ""
+
+msgid ""
+"In order to validate this idea, I started with a Clojure implementation. The"
+" goal was not to write the actual final code, but to make a proof-of-concept"
+" that would allow me to test and stretch the idea itself."
+msgstr ""
+
+msgid ""
+"This code [already "
+"exists](https://git.euandreh.xyz/mediator/tree/src/core/clojure/src/mediator.clj?id=db4a727bc24b54b50158827b34502de21dbf8948#n1),"
+" but is yet fairly incomplete:"
+msgstr ""
+
+msgid ""
+"the building of the index isn't done yet (with some [commented "
+"code](https://git.euandreh.xyz/mediator/tree/src/core/clojure/src/mediator.clj?id=db4a727bc24b54b50158827b34502de21dbf8948#n295)"
+" on the next step to be implemented)"
+msgstr ""
+
+msgid ""
+"the indexing is extremely inefficient, with "
+"[more](https://git.euandreh.xyz/mediator/tree/src/core/clojure/src/mediator.clj?id=db4a727bc24b54b50158827b34502de21dbf8948#n130)"
+" "
+"[than](https://git.euandreh.xyz/mediator/tree/src/core/clojure/src/mediator.clj?id=db4a727bc24b54b50158827b34502de21dbf8948#n146)"
+" "
+"[one](https://git.euandreh.xyz/mediator/tree/src/core/clojure/src/mediator.clj?id=db4a727bc24b54b50158827b34502de21dbf8948#n253)"
+" occurrence of `O²` functions;"
+msgstr ""
+
+msgid "no query support yet."
+msgstr ""
+
+msgid "Top-down *and* bottom-up"
+msgstr ""
+
+msgid ""
+"However, as time passed and I started looking at what the final "
+"implementation would look like, I started to consider keeping the PoC "
+"around."
+msgstr ""
+
+msgid ""
+"The top-down approach (Clojure PoC) was in fact helping guide me with the "
+"bottom-up, and I now have \"promoted\" the Clojure PoC into a \"reference "
+"implementation\". It should now be a finished implementation that says what "
+"the expected behaviour is, and the actual code should match the behaviour."
+msgstr ""
+
+msgid ""
+"The good thing about a reference implementation is that it has no "
+"performance of resources boundary, so if it ends up being 1000x slower and "
+"using 500× more memory, it should be find. The code can be also 10x or 100x "
+"simpler, too."
+msgstr ""
+
+msgid "Top-down: durable persistent trees"
+msgstr ""
+
+msgid ""
+"In promoting the PoC into a reference implementation, this top-down approach"
+" now needs to go beyond doing everything in memory, and the index data "
+"structure now needs to be disk-based."
+msgstr ""
+
+msgid ""
+"Roughly speaking, most storage engines out there are based either on B-Trees"
+" or LSM Trees, or some variations of those."
+msgstr ""
+
+msgid ""
+"But when building an immutable database, update-in-place B-Trees aren't an "
+"option, as it doesn't accommodate keeping historical views of the tree. LSM "
+"Trees may seem a better alternative, but duplication on the files with "
+"compaction are also ways to delete old data which is indeed useful for a "
+"historical view."
+msgstr ""
+
+msgid ""
+"I think the thing I'm after is a mix of a Copy-on-Write B-Tree, which would "
+"keep historical versions with the write IO cost amortization of memtables of"
+" LSM Trees. I don't know of any B-Tree variant out there that resembles "
+"this, so I'll call it \"Flushing Copy-on-Write B-Tree\"."
+msgstr ""
+
+msgid ""
+"I haven't written any code for this yet, so all I have is a high-level view "
+"of what it will look like:"
+msgstr ""
+
+msgid ""
+"like Copy-on-Write B-Trees, changing a leaf involves creating a new leaf and"
+" building a new path from root to the leaf. The upside is that writes a lock"
+" free, and no coordination is needed between readers and writers, ever;"
+msgstr ""
+
+msgid ""
+"the downside is that a single leaf update means at least `H` new nodes that "
+"will have to be flushed to disk, where `H` is the height of the tree. To "
+"avoid that, the writer creates these nodes exclusively on the in-memory "
+"memtable, to avoid flushing to disk on every leaf update;"
+msgstr ""
+
+msgid ""
+"a background job will consolidate the memtable data every time it hits X MB,"
+" and persist it to disk, amortizing the cost of the Copy-on-Write B-Tree;"
+msgstr ""
+
+msgid ""
+"readers than will have the extra job of getting the latest relevant disk-"
+"resident value and merge it with the memtable data."
+msgstr ""
+
+msgid ""
+"The key difference to existing Copy-on-Write B-Trees is that the new trees "
+"are only periodically written to disk, and the intermediate values are kept "
+"in memory. Since no node is ever updated, the page utilization is maximum as"
+" it doesn't need to keep space for future inserts and updates."
+msgstr ""
+
+msgid ""
+"And the key difference to existing LSM Trees is that no compaction is run: "
+"intermediate values are still relevant as the database grows. So this leaves"
+" out tombstones and value duplication done for write performance."
+msgstr ""
+
+msgid ""
+"One can delete intermediate index values to reclaim space, but no data is "
+"lost on the process, only old B-Tree values. And if the database ever comes "
+"back to that point (like when doing a historical query), the B-Tree will "
+"have to be rebuilt from a previous value. After all, the database *is* a set"
+" of datoms, and everything else is just derived data."
+msgstr ""
+
+msgid ""
+"Right now I'm still reading about other data structures that storage engines"
+" use, and I'll start implementing the \"Flushing Copy-on-Write B-Tree\" as I"
+" learn more[^learn-more-db] and mature it more."
+msgstr ""
+
+msgid ""
+"[^learn-more-db]: If you are interested in learning more about this too, the"
+" very best two resources on this subject are Andy Pavlo's \"[Intro to "
+"Database "
+"Systems](https://www.youtube.com/playlist?list=PLSE8ODhjZXjbohkNBWQs_otTrBTrjyohi)\""
+" course and Alex Petrov's \"[Database "
+"Internals](https://www.databass.dev/)\" book."
+msgstr ""
+
+msgid "Bottom-up: parser combinators and FFI"
+msgstr ""
+
+msgid "I chose Rust as it has the best WebAssembly tooling support."
+msgstr ""
+
+msgid ""
+"My goal is not to build a Rust database, but a database that happens to be "
+"in Rust. In order to reach client platforms, the primary API is the FFI one."
+msgstr ""
+
+msgid ""
+"I'm not very happy with current tools for exposing Rust code via FFI to the "
+"external world: they either mix C with C++, which I don't want to do, or "
+"provide no access to the intermediate representation of the FFI, which would"
+" be useful for generating binding for any language that speaks FFI."
+msgstr ""
+
+msgid ""
+"I like better the path that the author of "
+"[cbindgen](https://github.com/eqrion/cbindgen) crate "
+"[proposes](https://blog.eqrion.net/future-directions-for-cbindgen/): "
+"emitting an data representation of the Rust C API (the author calls is a "
+"`ffi.json` file), and than building transformers from the data "
+"representation to the target language. This way you could generate a C API "
+"*and* the node-ffi bindings for JavaScript automatically from the Rust code."
+msgstr ""
+
+msgid ""
+"So the first thing to be done before moving on is an FFI exporter that "
+"doesn't mix C and C++, and generates said `ffi.json`, and than build a few "
+"transformers that take this `ffi.json` and generate the language bindings, "
+"be it C, C++, JavaScript, TypeScript, Kotlin, Swift, Dart, *etc*[^ffi-"
+"langs]."
+msgstr ""
+
+msgid ""
+"[^ffi-langs]: Those are, specifically, the languages I'm more interested on."
+" My goal is supporting client applications, and those languages are the most"
+" relevant for doing so: C for GTK, C++ for Qt, JavaScript and TypeScript for"
+" Node.js and browser, Kotlin for Android and Swing, Swift for iOS, and Dart "
+"for Flutter."
+msgstr ""
+
+msgid ""
+"I think the best way to get there is by taking the existing code for "
+"cbindgen, which uses the [syn](https://github.com/dtolnay/syn) crate to "
+"parse the Rust code[^rust-syn], and adapt it to emit the metadata."
+msgstr ""
+
+msgid ""
+"[^rust-syn]: The fact that syn is an external crate to the Rust compiler "
+"points to a big warning: procedural macros are not first class in Rust. They"
+" are just like Babel plugins in JavaScript land, with the extra shortcoming "
+"that there is no specification for the Rust syntax, unlike JavaScript."
+msgstr ""
+
+msgid ""
+"As flawed as this may be, it seems to be generally acceptable and adopted,\n"
+"which works against building a solid ecosystem for Rust.\n"
+"\n"
+"The alternative that rust-ffi implements relies on internals of the Rust\n"
+"compiler, which isn't actually worst, just less common and less accepted.\n"
+msgstr ""
+
+msgid ""
+"I've started a fork of cbindgen: "
+"[x-bindgen](https://git.euandreh.xyz/x-bindgen/). Right now it is just a "
+"copy of cbindgen verbatim, and I plan to remove all C and C++ emitting code "
+"from it, and add a IR emitting code instead."
+msgstr ""
+
+msgid ""
+"When starting working on x-bindgen, I realized I didn't know what to look "
+"for in a header file, as I haven't written any C code in many years. So as I"
+" was writing [libedn](https://git.euandreh.xyz/libedn/), I didn't know how "
+"to build a good C API to expose. So I tried porting the code to C, and right"
+" now I'm working on building a *good* C API for a JSON parser using parser "
+"combinators: [ParsecC](https://git.euandreh.xyz/parsecc/)."
+msgstr ""
+
+msgid ""
+"After \"finishing\" ParsecC I'll have a good notion of what a good C API is,"
+" and I'll have a better direction towards how to expose code from libedn to "
+"other languages, and work on x-bindgen then."
+msgstr ""
+
+msgid ""
+"What both libedn and ParsecC are missing right now are proper error "
+"reporting, and property-based testing for libedn."
+msgstr ""
+
+msgid "Conclusion"
+msgstr ""
+
+msgid ""
+"I've learned a lot already, and I feel the journey I'm on is worth going "
+"through."
+msgstr ""
+
+msgid ""
+"If any of those topics interest you, message me to discuss more or "
+"contribute! Patches welcome!"
+msgstr ""
diff --git a/locale/fr/LC_MESSAGES/_articles/2020-11-12-durable-persistent-trees-and-parser-combinators-building-a-database.po b/locale/fr/LC_MESSAGES/_articles/2020-11-12-durable-persistent-trees-and-parser-combinators-building-a-database.po
new file mode 100644
index 0000000..5cb9491
--- /dev/null
+++ b/locale/fr/LC_MESSAGES/_articles/2020-11-12-durable-persistent-trees-and-parser-combinators-building-a-database.po
@@ -0,0 +1,342 @@
+#
+msgid ""
+msgstr ""
+
+msgid ""
+"title: Durable persistent trees and parser combinators - building a database"
+msgstr ""
+
+msgid "date: 2020-11-12"
+msgstr ""
+
+msgid "layout: post"
+msgstr ""
+
+msgid "lang: en"
+msgstr ""
+
+msgid ""
+"ref: durable-persistent-trees-and-parser-combinators-building-a-database"
+msgstr ""
+
+msgid "category: mediator"
+msgstr ""
+
+msgid ""
+"I've received with certain frequency messages from people wanting to know if"
+" I've made any progress on the database project [I've written about]({% link"
+" _articles/2020-08-31-the-database-i-wish-i-had.md %})."
+msgstr ""
+
+msgid ""
+"There are a few areas where I've made progress, and here's a public post on "
+"it."
+msgstr ""
+
+msgid "Proof-of-concept: DAG log"
+msgstr ""
+
+msgid ""
+"The main thing I wanted to validate with a concrete implementation was the "
+"concept of modeling a DAG on a sequence of datoms."
+msgstr ""
+
+msgid ""
+"The notion of a *datom* is a rip-off from Datomic, which models data with "
+"time aware *facts*, which come from RDF. RDF's fact is a triple of subject-"
+"predicate-object, and Datomic's datoms add a time component to it: subject-"
+"predicate-object-time, A.K.A. entity-attribute-value-transaction:"
+msgstr ""
+
+msgid ""
+"[[person :likes \"pizza\" 0 true]\n"
+" [person :likes \"bread\" 1 true]\n"
+" [person :likes \"pizza\" 1 false]]\n"
+msgstr ""
+
+msgid "The above datoms say:"
+msgstr ""
+
+msgid "at time 0, `person` like pizza;"
+msgstr ""
+
+msgid "at time 1, `person` stopped liking pizza, and started to like bread."
+msgstr ""
+
+msgid ""
+"Datomic ensures total consistency of this ever growing log by having a "
+"single writer, the transactor, that will enforce it when writing."
+msgstr ""
+
+msgid ""
+"In order to support disconnected clients, I needed a way to allow multiple "
+"writers, and I chose to do it by making the log not a list, but a directed "
+"acyclic graph (DAG):"
+msgstr ""
+
+msgid ""
+"[[person :likes  \"pizza\" 0 true]\n"
+" [0      :parent null    0 true]\n"
+" [person :likes  \"bread\" 1 true]\n"
+" [person :likes  \"pizza\" 1 false]\n"
+" [1      :parent 0       1 true]]\n"
+msgstr ""
+
+msgid ""
+"The extra datoms above add more information to build the directionality to "
+"the log, and instead of a single consistent log, the DAG could have multiple"
+" leaves that coexist, much like how different Git branches can have "
+"different \"latest\" commits."
+msgstr ""
+
+msgid ""
+"In order to validate this idea, I started with a Clojure implementation. The"
+" goal was not to write the actual final code, but to make a proof-of-concept"
+" that would allow me to test and stretch the idea itself."
+msgstr ""
+
+msgid ""
+"This code [already "
+"exists](https://git.euandreh.xyz/mediator/tree/src/core/clojure/src/mediator.clj?id=db4a727bc24b54b50158827b34502de21dbf8948#n1),"
+" but is yet fairly incomplete:"
+msgstr ""
+
+msgid ""
+"the building of the index isn't done yet (with some [commented "
+"code](https://git.euandreh.xyz/mediator/tree/src/core/clojure/src/mediator.clj?id=db4a727bc24b54b50158827b34502de21dbf8948#n295)"
+" on the next step to be implemented)"
+msgstr ""
+
+msgid ""
+"the indexing is extremely inefficient, with "
+"[more](https://git.euandreh.xyz/mediator/tree/src/core/clojure/src/mediator.clj?id=db4a727bc24b54b50158827b34502de21dbf8948#n130)"
+" "
+"[than](https://git.euandreh.xyz/mediator/tree/src/core/clojure/src/mediator.clj?id=db4a727bc24b54b50158827b34502de21dbf8948#n146)"
+" "
+"[one](https://git.euandreh.xyz/mediator/tree/src/core/clojure/src/mediator.clj?id=db4a727bc24b54b50158827b34502de21dbf8948#n253)"
+" occurrence of `O²` functions;"
+msgstr ""
+
+msgid "no query support yet."
+msgstr ""
+
+msgid "Top-down *and* bottom-up"
+msgstr ""
+
+msgid ""
+"However, as time passed and I started looking at what the final "
+"implementation would look like, I started to consider keeping the PoC "
+"around."
+msgstr ""
+
+msgid ""
+"The top-down approach (Clojure PoC) was in fact helping guide me with the "
+"bottom-up, and I now have \"promoted\" the Clojure PoC into a \"reference "
+"implementation\". It should now be a finished implementation that says what "
+"the expected behaviour is, and the actual code should match the behaviour."
+msgstr ""
+
+msgid ""
+"The good thing about a reference implementation is that it has no "
+"performance of resources boundary, so if it ends up being 1000x slower and "
+"using 500× more memory, it should be find. The code can be also 10x or 100x "
+"simpler, too."
+msgstr ""
+
+msgid "Top-down: durable persistent trees"
+msgstr ""
+
+msgid ""
+"In promoting the PoC into a reference implementation, this top-down approach"
+" now needs to go beyond doing everything in memory, and the index data "
+"structure now needs to be disk-based."
+msgstr ""
+
+msgid ""
+"Roughly speaking, most storage engines out there are based either on B-Trees"
+" or LSM Trees, or some variations of those."
+msgstr ""
+
+msgid ""
+"But when building an immutable database, update-in-place B-Trees aren't an "
+"option, as it doesn't accommodate keeping historical views of the tree. LSM "
+"Trees may seem a better alternative, but duplication on the files with "
+"compaction are also ways to delete old data which is indeed useful for a "
+"historical view."
+msgstr ""
+
+msgid ""
+"I think the thing I'm after is a mix of a Copy-on-Write B-Tree, which would "
+"keep historical versions with the write IO cost amortization of memtables of"
+" LSM Trees. I don't know of any B-Tree variant out there that resembles "
+"this, so I'll call it \"Flushing Copy-on-Write B-Tree\"."
+msgstr ""
+
+msgid ""
+"I haven't written any code for this yet, so all I have is a high-level view "
+"of what it will look like:"
+msgstr ""
+
+msgid ""
+"like Copy-on-Write B-Trees, changing a leaf involves creating a new leaf and"
+" building a new path from root to the leaf. The upside is that writes a lock"
+" free, and no coordination is needed between readers and writers, ever;"
+msgstr ""
+
+msgid ""
+"the downside is that a single leaf update means at least `H` new nodes that "
+"will have to be flushed to disk, where `H` is the height of the tree. To "
+"avoid that, the writer creates these nodes exclusively on the in-memory "
+"memtable, to avoid flushing to disk on every leaf update;"
+msgstr ""
+
+msgid ""
+"a background job will consolidate the memtable data every time it hits X MB,"
+" and persist it to disk, amortizing the cost of the Copy-on-Write B-Tree;"
+msgstr ""
+
+msgid ""
+"readers than will have the extra job of getting the latest relevant disk-"
+"resident value and merge it with the memtable data."
+msgstr ""
+
+msgid ""
+"The key difference to existing Copy-on-Write B-Trees is that the new trees "
+"are only periodically written to disk, and the intermediate values are kept "
+"in memory. Since no node is ever updated, the page utilization is maximum as"
+" it doesn't need to keep space for future inserts and updates."
+msgstr ""
+
+msgid ""
+"And the key difference to existing LSM Trees is that no compaction is run: "
+"intermediate values are still relevant as the database grows. So this leaves"
+" out tombstones and value duplication done for write performance."
+msgstr ""
+
+msgid ""
+"One can delete intermediate index values to reclaim space, but no data is "
+"lost on the process, only old B-Tree values. And if the database ever comes "
+"back to that point (like when doing a historical query), the B-Tree will "
+"have to be rebuilt from a previous value. After all, the database *is* a set"
+" of datoms, and everything else is just derived data."
+msgstr ""
+
+msgid ""
+"Right now I'm still reading about other data structures that storage engines"
+" use, and I'll start implementing the \"Flushing Copy-on-Write B-Tree\" as I"
+" learn more[^learn-more-db] and mature it more."
+msgstr ""
+
+msgid ""
+"[^learn-more-db]: If you are interested in learning more about this too, the"
+" very best two resources on this subject are Andy Pavlo's \"[Intro to "
+"Database "
+"Systems](https://www.youtube.com/playlist?list=PLSE8ODhjZXjbohkNBWQs_otTrBTrjyohi)\""
+" course and Alex Petrov's \"[Database "
+"Internals](https://www.databass.dev/)\" book."
+msgstr ""
+
+msgid "Bottom-up: parser combinators and FFI"
+msgstr ""
+
+msgid "I chose Rust as it has the best WebAssembly tooling support."
+msgstr ""
+
+msgid ""
+"My goal is not to build a Rust database, but a database that happens to be "
+"in Rust. In order to reach client platforms, the primary API is the FFI one."
+msgstr ""
+
+msgid ""
+"I'm not very happy with current tools for exposing Rust code via FFI to the "
+"external world: they either mix C with C++, which I don't want to do, or "
+"provide no access to the intermediate representation of the FFI, which would"
+" be useful for generating binding for any language that speaks FFI."
+msgstr ""
+
+msgid ""
+"I like better the path that the author of "
+"[cbindgen](https://github.com/eqrion/cbindgen) crate "
+"[proposes](https://blog.eqrion.net/future-directions-for-cbindgen/): "
+"emitting an data representation of the Rust C API (the author calls is a "
+"`ffi.json` file), and than building transformers from the data "
+"representation to the target language. This way you could generate a C API "
+"*and* the node-ffi bindings for JavaScript automatically from the Rust code."
+msgstr ""
+
+msgid ""
+"So the first thing to be done before moving on is an FFI exporter that "
+"doesn't mix C and C++, and generates said `ffi.json`, and than build a few "
+"transformers that take this `ffi.json` and generate the language bindings, "
+"be it C, C++, JavaScript, TypeScript, Kotlin, Swift, Dart, *etc*[^ffi-"
+"langs]."
+msgstr ""
+
+msgid ""
+"[^ffi-langs]: Those are, specifically, the languages I'm more interested on."
+" My goal is supporting client applications, and those languages are the most"
+" relevant for doing so: C for GTK, C++ for Qt, JavaScript and TypeScript for"
+" Node.js and browser, Kotlin for Android and Swing, Swift for iOS, and Dart "
+"for Flutter."
+msgstr ""
+
+msgid ""
+"I think the best way to get there is by taking the existing code for "
+"cbindgen, which uses the [syn](https://github.com/dtolnay/syn) crate to "
+"parse the Rust code[^rust-syn], and adapt it to emit the metadata."
+msgstr ""
+
+msgid ""
+"[^rust-syn]: The fact that syn is an external crate to the Rust compiler "
+"points to a big warning: procedural macros are not first class in Rust. They"
+" are just like Babel plugins in JavaScript land, with the extra shortcoming "
+"that there is no specification for the Rust syntax, unlike JavaScript."
+msgstr ""
+
+msgid ""
+"As flawed as this may be, it seems to be generally acceptable and adopted,\n"
+"which works against building a solid ecosystem for Rust.\n"
+"\n"
+"The alternative that rust-ffi implements relies on internals of the Rust\n"
+"compiler, which isn't actually worst, just less common and less accepted.\n"
+msgstr ""
+
+msgid ""
+"I've started a fork of cbindgen: "
+"[x-bindgen](https://git.euandreh.xyz/x-bindgen/). Right now it is just a "
+"copy of cbindgen verbatim, and I plan to remove all C and C++ emitting code "
+"from it, and add a IR emitting code instead."
+msgstr ""
+
+msgid ""
+"When starting working on x-bindgen, I realized I didn't know what to look "
+"for in a header file, as I haven't written any C code in many years. So as I"
+" was writing [libedn](https://git.euandreh.xyz/libedn/), I didn't know how "
+"to build a good C API to expose. So I tried porting the code to C, and right"
+" now I'm working on building a *good* C API for a JSON parser using parser "
+"combinators: [ParsecC](https://git.euandreh.xyz/parsecc/)."
+msgstr ""
+
+msgid ""
+"After \"finishing\" ParsecC I'll have a good notion of what a good C API is,"
+" and I'll have a better direction towards how to expose code from libedn to "
+"other languages, and work on x-bindgen then."
+msgstr ""
+
+msgid ""
+"What both libedn and ParsecC are missing right now are proper error "
+"reporting, and property-based testing for libedn."
+msgstr ""
+
+msgid "Conclusion"
+msgstr ""
+
+msgid ""
+"I've learned a lot already, and I feel the journey I'm on is worth going "
+"through."
+msgstr ""
+
+msgid ""
+"If any of those topics interest you, message me to discuss more or "
+"contribute! Patches welcome!"
+msgstr ""
diff --git a/locale/pt/LC_MESSAGES/_articles/2020-11-12-durable-persistent-trees-and-parser-combinators-building-a-database.po b/locale/pt/LC_MESSAGES/_articles/2020-11-12-durable-persistent-trees-and-parser-combinators-building-a-database.po
new file mode 100644
index 0000000..5cb9491
--- /dev/null
+++ b/locale/pt/LC_MESSAGES/_articles/2020-11-12-durable-persistent-trees-and-parser-combinators-building-a-database.po
@@ -0,0 +1,342 @@
+#
+msgid ""
+msgstr ""
+
+msgid ""
+"title: Durable persistent trees and parser combinators - building a database"
+msgstr ""
+
+msgid "date: 2020-11-12"
+msgstr ""
+
+msgid "layout: post"
+msgstr ""
+
+msgid "lang: en"
+msgstr ""
+
+msgid ""
+"ref: durable-persistent-trees-and-parser-combinators-building-a-database"
+msgstr ""
+
+msgid "category: mediator"
+msgstr ""
+
+msgid ""
+"I've received with certain frequency messages from people wanting to know if"
+" I've made any progress on the database project [I've written about]({% link"
+" _articles/2020-08-31-the-database-i-wish-i-had.md %})."
+msgstr ""
+
+msgid ""
+"There are a few areas where I've made progress, and here's a public post on "
+"it."
+msgstr ""
+
+msgid "Proof-of-concept: DAG log"
+msgstr ""
+
+msgid ""
+"The main thing I wanted to validate with a concrete implementation was the "
+"concept of modeling a DAG on a sequence of datoms."
+msgstr ""
+
+msgid ""
+"The notion of a *datom* is a rip-off from Datomic, which models data with "
+"time aware *facts*, which come from RDF. RDF's fact is a triple of subject-"
+"predicate-object, and Datomic's datoms add a time component to it: subject-"
+"predicate-object-time, A.K.A. entity-attribute-value-transaction:"
+msgstr ""
+
+msgid ""
+"[[person :likes \"pizza\" 0 true]\n"
+" [person :likes \"bread\" 1 true]\n"
+" [person :likes \"pizza\" 1 false]]\n"
+msgstr ""
+
+msgid "The above datoms say:"
+msgstr ""
+
+msgid "at time 0, `person` like pizza;"
+msgstr ""
+
+msgid "at time 1, `person` stopped liking pizza, and started to like bread."
+msgstr ""
+
+msgid ""
+"Datomic ensures total consistency of this ever growing log by having a "
+"single writer, the transactor, that will enforce it when writing."
+msgstr ""
+
+msgid ""
+"In order to support disconnected clients, I needed a way to allow multiple "
+"writers, and I chose to do it by making the log not a list, but a directed "
+"acyclic graph (DAG):"
+msgstr ""
+
+msgid ""
+"[[person :likes  \"pizza\" 0 true]\n"
+" [0      :parent null    0 true]\n"
+" [person :likes  \"bread\" 1 true]\n"
+" [person :likes  \"pizza\" 1 false]\n"
+" [1      :parent 0       1 true]]\n"
+msgstr ""
+
+msgid ""
+"The extra datoms above add more information to build the directionality to "
+"the log, and instead of a single consistent log, the DAG could have multiple"
+" leaves that coexist, much like how different Git branches can have "
+"different \"latest\" commits."
+msgstr ""
+
+msgid ""
+"In order to validate this idea, I started with a Clojure implementation. The"
+" goal was not to write the actual final code, but to make a proof-of-concept"
+" that would allow me to test and stretch the idea itself."
+msgstr ""
+
+msgid ""
+"This code [already "
+"exists](https://git.euandreh.xyz/mediator/tree/src/core/clojure/src/mediator.clj?id=db4a727bc24b54b50158827b34502de21dbf8948#n1),"
+" but is yet fairly incomplete:"
+msgstr ""
+
+msgid ""
+"the building of the index isn't done yet (with some [commented "
+"code](https://git.euandreh.xyz/mediator/tree/src/core/clojure/src/mediator.clj?id=db4a727bc24b54b50158827b34502de21dbf8948#n295)"
+" on the next step to be implemented)"
+msgstr ""
+
+msgid ""
+"the indexing is extremely inefficient, with "
+"[more](https://git.euandreh.xyz/mediator/tree/src/core/clojure/src/mediator.clj?id=db4a727bc24b54b50158827b34502de21dbf8948#n130)"
+" "
+"[than](https://git.euandreh.xyz/mediator/tree/src/core/clojure/src/mediator.clj?id=db4a727bc24b54b50158827b34502de21dbf8948#n146)"
+" "
+"[one](https://git.euandreh.xyz/mediator/tree/src/core/clojure/src/mediator.clj?id=db4a727bc24b54b50158827b34502de21dbf8948#n253)"
+" occurrence of `O²` functions;"
+msgstr ""
+
+msgid "no query support yet."
+msgstr ""
+
+msgid "Top-down *and* bottom-up"
+msgstr ""
+
+msgid ""
+"However, as time passed and I started looking at what the final "
+"implementation would look like, I started to consider keeping the PoC "
+"around."
+msgstr ""
+
+msgid ""
+"The top-down approach (Clojure PoC) was in fact helping guide me with the "
+"bottom-up, and I now have \"promoted\" the Clojure PoC into a \"reference "
+"implementation\". It should now be a finished implementation that says what "
+"the expected behaviour is, and the actual code should match the behaviour."
+msgstr ""
+
+msgid ""
+"The good thing about a reference implementation is that it has no "
+"performance of resources boundary, so if it ends up being 1000x slower and "
+"using 500× more memory, it should be find. The code can be also 10x or 100x "
+"simpler, too."
+msgstr ""
+
+msgid "Top-down: durable persistent trees"
+msgstr ""
+
+msgid ""
+"In promoting the PoC into a reference implementation, this top-down approach"
+" now needs to go beyond doing everything in memory, and the index data "
+"structure now needs to be disk-based."
+msgstr ""
+
+msgid ""
+"Roughly speaking, most storage engines out there are based either on B-Trees"
+" or LSM Trees, or some variations of those."
+msgstr ""
+
+msgid ""
+"But when building an immutable database, update-in-place B-Trees aren't an "
+"option, as it doesn't accommodate keeping historical views of the tree. LSM "
+"Trees may seem a better alternative, but duplication on the files with "
+"compaction are also ways to delete old data which is indeed useful for a "
+"historical view."
+msgstr ""
+
+msgid ""
+"I think the thing I'm after is a mix of a Copy-on-Write B-Tree, which would "
+"keep historical versions with the write IO cost amortization of memtables of"
+" LSM Trees. I don't know of any B-Tree variant out there that resembles "
+"this, so I'll call it \"Flushing Copy-on-Write B-Tree\"."
+msgstr ""
+
+msgid ""
+"I haven't written any code for this yet, so all I have is a high-level view "
+"of what it will look like:"
+msgstr ""
+
+msgid ""
+"like Copy-on-Write B-Trees, changing a leaf involves creating a new leaf and"
+" building a new path from root to the leaf. The upside is that writes a lock"
+" free, and no coordination is needed between readers and writers, ever;"
+msgstr ""
+
+msgid ""
+"the downside is that a single leaf update means at least `H` new nodes that "
+"will have to be flushed to disk, where `H` is the height of the tree. To "
+"avoid that, the writer creates these nodes exclusively on the in-memory "
+"memtable, to avoid flushing to disk on every leaf update;"
+msgstr ""
+
+msgid ""
+"a background job will consolidate the memtable data every time it hits X MB,"
+" and persist it to disk, amortizing the cost of the Copy-on-Write B-Tree;"
+msgstr ""
+
+msgid ""
+"readers than will have the extra job of getting the latest relevant disk-"
+"resident value and merge it with the memtable data."
+msgstr ""
+
+msgid ""
+"The key difference to existing Copy-on-Write B-Trees is that the new trees "
+"are only periodically written to disk, and the intermediate values are kept "
+"in memory. Since no node is ever updated, the page utilization is maximum as"
+" it doesn't need to keep space for future inserts and updates."
+msgstr ""
+
+msgid ""
+"And the key difference to existing LSM Trees is that no compaction is run: "
+"intermediate values are still relevant as the database grows. So this leaves"
+" out tombstones and value duplication done for write performance."
+msgstr ""
+
+msgid ""
+"One can delete intermediate index values to reclaim space, but no data is "
+"lost on the process, only old B-Tree values. And if the database ever comes "
+"back to that point (like when doing a historical query), the B-Tree will "
+"have to be rebuilt from a previous value. After all, the database *is* a set"
+" of datoms, and everything else is just derived data."
+msgstr ""
+
+msgid ""
+"Right now I'm still reading about other data structures that storage engines"
+" use, and I'll start implementing the \"Flushing Copy-on-Write B-Tree\" as I"
+" learn more[^learn-more-db] and mature it more."
+msgstr ""
+
+msgid ""
+"[^learn-more-db]: If you are interested in learning more about this too, the"
+" very best two resources on this subject are Andy Pavlo's \"[Intro to "
+"Database "
+"Systems](https://www.youtube.com/playlist?list=PLSE8ODhjZXjbohkNBWQs_otTrBTrjyohi)\""
+" course and Alex Petrov's \"[Database "
+"Internals](https://www.databass.dev/)\" book."
+msgstr ""
+
+msgid "Bottom-up: parser combinators and FFI"
+msgstr ""
+
+msgid "I chose Rust as it has the best WebAssembly tooling support."
+msgstr ""
+
+msgid ""
+"My goal is not to build a Rust database, but a database that happens to be "
+"in Rust. In order to reach client platforms, the primary API is the FFI one."
+msgstr ""
+
+msgid ""
+"I'm not very happy with current tools for exposing Rust code via FFI to the "
+"external world: they either mix C with C++, which I don't want to do, or "
+"provide no access to the intermediate representation of the FFI, which would"
+" be useful for generating binding for any language that speaks FFI."
+msgstr ""
+
+msgid ""
+"I like better the path that the author of "
+"[cbindgen](https://github.com/eqrion/cbindgen) crate "
+"[proposes](https://blog.eqrion.net/future-directions-for-cbindgen/): "
+"emitting an data representation of the Rust C API (the author calls is a "
+"`ffi.json` file), and than building transformers from the data "
+"representation to the target language. This way you could generate a C API "
+"*and* the node-ffi bindings for JavaScript automatically from the Rust code."
+msgstr ""
+
+msgid ""
+"So the first thing to be done before moving on is an FFI exporter that "
+"doesn't mix C and C++, and generates said `ffi.json`, and than build a few "
+"transformers that take this `ffi.json` and generate the language bindings, "
+"be it C, C++, JavaScript, TypeScript, Kotlin, Swift, Dart, *etc*[^ffi-"
+"langs]."
+msgstr ""
+
+msgid ""
+"[^ffi-langs]: Those are, specifically, the languages I'm more interested on."
+" My goal is supporting client applications, and those languages are the most"
+" relevant for doing so: C for GTK, C++ for Qt, JavaScript and TypeScript for"
+" Node.js and browser, Kotlin for Android and Swing, Swift for iOS, and Dart "
+"for Flutter."
+msgstr ""
+
+msgid ""
+"I think the best way to get there is by taking the existing code for "
+"cbindgen, which uses the [syn](https://github.com/dtolnay/syn) crate to "
+"parse the Rust code[^rust-syn], and adapt it to emit the metadata."
+msgstr ""
+
+msgid ""
+"[^rust-syn]: The fact that syn is an external crate to the Rust compiler "
+"points to a big warning: procedural macros are not first class in Rust. They"
+" are just like Babel plugins in JavaScript land, with the extra shortcoming "
+"that there is no specification for the Rust syntax, unlike JavaScript."
+msgstr ""
+
+msgid ""
+"As flawed as this may be, it seems to be generally acceptable and adopted,\n"
+"which works against building a solid ecosystem for Rust.\n"
+"\n"
+"The alternative that rust-ffi implements relies on internals of the Rust\n"
+"compiler, which isn't actually worst, just less common and less accepted.\n"
+msgstr ""
+
+msgid ""
+"I've started a fork of cbindgen: "
+"[x-bindgen](https://git.euandreh.xyz/x-bindgen/). Right now it is just a "
+"copy of cbindgen verbatim, and I plan to remove all C and C++ emitting code "
+"from it, and add a IR emitting code instead."
+msgstr ""
+
+msgid ""
+"When starting working on x-bindgen, I realized I didn't know what to look "
+"for in a header file, as I haven't written any C code in many years. So as I"
+" was writing [libedn](https://git.euandreh.xyz/libedn/), I didn't know how "
+"to build a good C API to expose. So I tried porting the code to C, and right"
+" now I'm working on building a *good* C API for a JSON parser using parser "
+"combinators: [ParsecC](https://git.euandreh.xyz/parsecc/)."
+msgstr ""
+
+msgid ""
+"After \"finishing\" ParsecC I'll have a good notion of what a good C API is,"
+" and I'll have a better direction towards how to expose code from libedn to "
+"other languages, and work on x-bindgen then."
+msgstr ""
+
+msgid ""
+"What both libedn and ParsecC are missing right now are proper error "
+"reporting, and property-based testing for libedn."
+msgstr ""
+
+msgid "Conclusion"
+msgstr ""
+
+msgid ""
+"I've learned a lot already, and I feel the journey I'm on is worth going "
+"through."
+msgstr ""
+
+msgid ""
+"If any of those topics interest you, message me to discuss more or "
+"contribute! Patches welcome!"
+msgstr ""
diff --git a/scripts/spelling/en.txt b/scripts/spelling/en.txt
index f05298c..de42ebe 100644
--- a/scripts/spelling/en.txt
+++ b/scripts/spelling/en.txt
@@ -1,6 +1,7 @@
 Patches
 Rollout
 Slides
+acyclic
 analytics
 aren
 autocommit
@@ -9,6 +10,7 @@ balancer
 barcode
 behaviour
 chargeback
+combinators
 couldn
 cronjobs
 culting
@@ -21,6 +23,7 @@ demoer
 didactical
 didn
 differentiator
+directionality
 doesn
 duplications
 dynamicity
diff --git a/scripts/spelling/international.txt b/scripts/spelling/international.txt
index 7af1b73..5a929e5 100644
--- a/scripts/spelling/international.txt
+++ b/scripts/spelling/international.txt
@@ -58,6 +58,7 @@ JSON
 Joyent
 Kevlin
 L1
+LSM
 LTS
 LaTeX
 Lerna
@@ -73,9 +74,14 @@ NixOS
 OOP
 OTP
 POSIX
+ParsecC
 Pastebin
+Pavlo
+Petrov
 Pittet
+PoC
 PouchDB
+RDF
 README
 RPN
 RSS
@@ -105,12 +111,14 @@ Yandex
 YouTube
 Zig
 apk
+bindgen
 boneco
 br
 brainer
 buildGoModule
 cargo2nix
 carte
+cbindgen
 cgit
 ci
 clojure
@@ -126,6 +134,7 @@ eo
 euandre
 euandreh
 eval
+ffi
 fr
 frontend
 gcrypt
@@ -149,10 +158,13 @@ js
 k8s
 kramdown
 kubernetes
+libedn
 libre
 lockfile
 lockfiles
 lt
+memtable
+memtables
 merkle
 myrepos
 nixos