1 files changed, 226 insertions, 0 deletions
diff --git a/src/content/en/blog/2020/11/12/database-parsers-trees.adoc b/src/content/en/blog/2020/11/12/database-parsers-trees.adoc
new file mode 100644
index 0000000..47595e8
--- /dev/null
+++ b/src/content/en/blog/2020/11/12/database-parsers-trees.adoc
@@ -0,0 +1,226 @@
+= Durable persistent trees and parser combinators - building a database
+:categories: mediator
+:updatedat: 2021-02-09
+
+:empty:
+:db-article: link:../../08/31/database-i-wish-i-had.html
+
+I've received with certain frequency messages from people wanting to know if
+I've made any progress on the database project {db-article}[I've written about].
+
+There are a few areas where I've made progress, and here's a public post on it.
+
+== Proof-of-concept: DAG log
+
+:mediator-permalink: https://euandre.org/git/mediator/tree/src/core/clojure/src/mediator.clj?id=db4a727bc24b54b50158827b34502de21dbf8948#n1
+
+The main thing I wanted to validate with a concrete implementation was the
+concept of modeling a DAG on a sequence of datoms.
+
+The notion of a _datom_ is a rip-off from Datomic, which models data with time
+aware _facts_, which come from RDF.  RDF's fact is a triple of
+subject-predicate-object, and Datomic's datoms add a time component to it:
+subject-predicate-object-time, A.K.A. entity-attribute-value-transaction:
+
+[source,clojure]
+----
+[[person :likes "pizza" 0 true]
+ [person :likes "bread" 1 true]
+ [person :likes "pizza" 1 false]]
+----
+
+The above datoms say: - at time 0, `person` like pizza; - at time 1, `person`
+stopped liking pizza, and started to like bread.
+
+Datomic ensures total consistency of this ever growing log by having a single
+writer, the transactor, that will enforce it when writing.
+
+In order to support disconnected clients, I needed a way to allow multiple
+writers, and I chose to do it by making the log not a list, but a directed
+acyclic graph (DAG):
+
+[source,clojure]
+----
+[[person :likes  "pizza"  0 true]
+ [0      :parent :db/root 0 true]
+ [person :likes  "bread"  1 true]
+ [person :likes  "pizza"  1 false]
+ [1      :parent 0        1 true]]
+----
+
+The extra datoms above add more information to build the directionality to the
+log, and instead of a single consistent log, the DAG could have multiple leaves
+that coexist, much like how different Git branches can have different "latest"
+commits.
+
+In order to validate this idea, I started with a Clojure implementation.  The
+goal was not to write the actual final code, but to make a proof-of-concept that
+would allow me to test and stretch the idea itself.
+
+This code {mediator-permalink}[already exists], but is yet fairly incomplete:
+
+:commented-code: https://euandre.org/git/mediator/tree/src/core/clojure/src/mediator.clj?id=db4a727bc24b54b50158827b34502de21dbf8948#n295
+:more: https://euandre.org/git/mediator/tree/src/core/clojure/src/mediator.clj?id=db4a727bc24b54b50158827b34502de21dbf8948#n130
+:than: https://euandre.org/git/mediator/tree/src/core/clojure/src/mediator.clj?id=db4a727bc24b54b50158827b34502de21dbf8948#n146
+:one: https://euandre.org/git/mediator/tree/src/core/clojure/src/mediator.clj?id=db4a727bc24b54b50158827b34502de21dbf8948#n253
+
+* the building of the index isn't done yet (with some {commented-code}[commented
+  code] on the next step to be implemented)
+* the indexing is extremely inefficient, with {more}[more] {than}[than]
+  {one}[one] occurrence of `O²` functions;
+* no query support yet.
+
+== Top-down _and_ bottom-up
+
+However, as time passed and I started looking at what the final implementation
+would look like, I started to consider keeping the PoC around.
+
+The top-down approach (Clojure PoC) was in fact helping guide me with the
+bottom-up, and I now have "promoted" the Clojure PoC into a "reference
+implementation".  It should now be a finished implementation that says what the
+expected behaviour is, and the actual code should match the behaviour.
+
+The good thing about a reference implementation is that it has no performance of
+resources boundary, so if it ends up being 1000× slower and using 500× more
+memory, it should be find.  The code can be also 10× or 100× simpler, too.
+
+== Top-down: durable persistent trees
+
+:pavlo-videos: https://www.youtube.com/playlist?list=PLSE8ODhjZXjbohkNBWQs_otTrBTrjyohi
+:db-book: https://www.databass.dev/
+
+In promoting the PoC into a reference implementation, this top-down approach now
+needs to go beyond doing everything in memory, and the index data structure now
+needs to be disk-based.
+
+Roughly speaking, most storage engines out there are based either on B-Trees or
+LSM Trees, or some variations of those.
+
+But when building an immutable database, update-in-place B-Trees aren't an
+option, as it doesn't accommodate keeping historical views of the tree.  LSM
+Trees may seem a better alternative, but duplication on the files with
+compaction are also ways to delete old data which is indeed useful for a
+historical view.
+
+I think the thing I'm after is a mix of a Copy-on-Write B-Tree, which would keep
+historical versions with the write IO cost amortization of memtables of LSM
+Trees.  I don't know of any B-Tree variant out there that resembles this, so
+I'll call it "Flushing Copy-on-Write B-Tree".
+
+I haven't written any code for this yet, so all I have is a high-level view of
+what it will look like:
+
+. like Copy-on-Write B-Trees, changing a leaf involves creating a new leaf and
+  building a new path from root to the leaf.  The upside is that writes a lock
+  free, and no coordination is needed between readers and writers, ever;
+. the downside is that a single leaf update means at least `H` new nodes that
+  will have to be flushed to disk, where `H` is the height of the tree.  To
+  avoid that, the writer creates these nodes exclusively on the in-memory
+  memtable, to avoid flushing to disk on every leaf update;
+. a background job will consolidate the memtable data every time it hits X MB,
+  and persist it to disk, amortizing the cost of the Copy-on-Write B-Tree;
+. readers than will have the extra job of getting the latest relevant
+  disk-resident value and merge it with the memtable data.
+
+The key difference to existing Copy-on-Write B-Trees is that the new trees are
+only periodically written to disk, and the intermediate values are kept in
+memory.  Since no node is ever updated, the page utilization is maximum as it
+doesn't need to keep space for future inserts and updates.
+
+And the key difference to existing LSM Trees is that no compaction is run:
+intermediate values are still relevant as the database grows.  So this leaves
+out tombstones and value duplication done for write performance.
+
+One can delete intermediate index values to reclaim space, but no data is lost
+on the process, only old B-Tree values.  And if the database ever comes back to
+that point (like when doing a historical query), the B-Tree will have to be
+rebuilt from a previous value.  After all, the database _is_ a set of datoms,
+and everything else is just derived data.
+
+Right now I'm still reading about other data structures that storage engines
+use, and I'll start implementing the "Flushing Copy-on-Write B-Tree" as I learn
+more{empty}footnote:learn-more-db[
+  If you are interested in learning more about this too, the very best two
+  resources on this subject are Andy Pavlo's "{pavlo-videos}[Intro to Database
+  Systems]" course and Alex Petrov's "{db-book}[Database Internals]" book.
+] and mature it more.
+
+== Bottom-up: parser combinators and FFI
+
+:cbindgen: https://github.com/eqrion/cbindgen
+:cbindgen-next: https://blog.eqrion.net/future-directions-for-cbindgen/
+:syn-crate: https://github.com/dtolnay/syn
+:libedn: https://euandre.org/git/libedn/
+
+I chose Rust as it has the best WebAssembly tooling support.
+
+My goal is not to build a Rust database, but a database that happens to be in
+Rust.  In order to reach client platforms, the primary API is the FFI one.
+
+I'm not very happy with current tools for exposing Rust code via FFI to the
+external world: they either mix C with C++, which I don't want to do, or
+provide no access to the intermediate representation of the FFI, which would be
+useful for generating binding for any language that speaks FFI.
+
+I like better the path that the author of {cbindgen}[cbindgen] crate
+{cbindgen-next}[proposes]: emitting an data representation of the Rust C API
+(the author calls is a `ffi.json` file), and than building transformers from the
+data representation to the target language.  This way you could generate a C API
+_and_ the node-ffi bindings for JavaScript automatically from the Rust code.
+
+So the first thing to be done before moving on is an FFI exporter that doesn't
+mix C and C++, and generates said `ffi.json`, and than build a few transformers
+that take this `ffi.json` and generate the language bindings, be it C, C++,
+JavaScript, TypeScript, Kotlin, Swift, Dart,
+_etc_footnote:ffi-langs[
+  Those are, specifically, the languages I'm more interested on.  My goal is
+  supporting client applications, and those languages are the most relevant for
+  doing so: C for GTK, C++ for Qt, JavaScript and TypeScript for Node.js and
+  browser, Kotlin for Android and Swing, Swift for iOS, and Dart for Flutter.
+].
+
+I think the best way to get there is by taking the existing code for cbindgen,
+which uses the {syn-crate}[syn] crate to parse the Rust
+code{empty}footnote:rust-syn[
+  The fact that syn is an external crate to the Rust compiler points to a big
+  warning: procedural macros are not first class in Rust.  They are just like
+  Babel plugins in JavaScript land, with the extra shortcoming that there is no
+  specification for the Rust syntax, unlike JavaScript.
+pass:[</p><p>]
+  As flawed as this may be, it seems to be generally acceptable and adopted,
+  which works against building a solid ecosystem for Rust.
+pass:[</p><p>]
+  The alternative that rust-ffi implements relies on internals of the Rust
+  compiler, which isn't actually worst, just less common and less accepted.
+], and adapt it to emit the metadata.
+
+I've started a fork of cbindgen:
+[line-through]#x-bindgen#{empty}footnote:x-bindgen[
+  _EDIT_: now archived, the experimentation was fun.  I've started to move more
+  towards C, so this effort became deprecated.
+].  Right now it is just a copy of cbindgen verbatim, and I plan to remove all C
+and C++ emitting code from it, and add a IR emitting code instead.
+
+When starting working on x-bindgen, I realized I didn't know what to look for in
+a header file, as I haven't written any C code in many years.  So as I was
+writing {libedn}[libedn], I didn't know how to build a good C API to expose.  So
+I tried porting the code to C, and right now I'm working on building a _good_ C
+API for a JSON parser using parser combinators:
+[line-through]#ParsecC#{empty}footnote:parsecc[
+  _EDIT_: now also archived.
+].
+
+After "finishing" ParsecC I'll have a good notion of what a good C API is, and
+I'll have a better direction towards how to expose code from libedn to other
+languages, and work on x-bindgen then.
+
+What both libedn and ParsecC are missing right now are proper error reporting,
+and property-based testing for libedn.
+
+== Conclusion
+
+I've learned a lot already, and I feel the journey I'm on is worth going
+through.
+
+If any of those topics interest you, message me to discuss more or contribute!
+Patches welcome!