diff options
Diffstat (limited to 'po/pt/LC_MESSAGES/_articles/2020-11-12-durable-persistent-trees-and-parser-combinators-building-a-database.po')
-rw-r--r-- | po/pt/LC_MESSAGES/_articles/2020-11-12-durable-persistent-trees-and-parser-combinators-building-a-database.po | 416 |
1 files changed, 0 insertions, 416 deletions
diff --git a/po/pt/LC_MESSAGES/_articles/2020-11-12-durable-persistent-trees-and-parser-combinators-building-a-database.po b/po/pt/LC_MESSAGES/_articles/2020-11-12-durable-persistent-trees-and-parser-combinators-building-a-database.po deleted file mode 100644 index 1fe1aae..0000000 --- a/po/pt/LC_MESSAGES/_articles/2020-11-12-durable-persistent-trees-and-parser-combinators-building-a-database.po +++ /dev/null @@ -1,416 +0,0 @@ -# -msgid "" -msgstr "" - -msgid "" -"title: Durable persistent trees and parser combinators - building a database" -msgstr "" - -msgid "date: 2020-11-12" -msgstr "" - -msgid "layout: post" -msgstr "" - -msgid "lang: en" -msgstr "" - -msgid "" -"ref: durable-persistent-trees-and-parser-combinators-building-a-database" -msgstr "" - -msgid "" -"I've received with certain frequency messages from people wanting to know if" -" I've made any progress on the database project [I've written about]({% link" -" _articles/2020-08-31-the-database-i-wish-i-had.md %})." -msgstr "" - -msgid "" -"There are a few areas where I've made progress, and here's a public post on " -"it." -msgstr "" - -msgid "Proof-of-concept: DAG log" -msgstr "" - -msgid "" -"The main thing I wanted to validate with a concrete implementation was the " -"concept of modeling a DAG on a sequence of datoms." -msgstr "" - -msgid "" -"The notion of a *datom* is a rip-off from Datomic, which models data with " -"time aware *facts*, which come from RDF. RDF's fact is a triple of subject-" -"predicate-object, and Datomic's datoms add a time component to it: subject-" -"predicate-object-time, A.K.A. entity-attribute-value-transaction:" -msgstr "" - -msgid "" -"[[person :likes \"pizza\" 0 true]\n" -" [person :likes \"bread\" 1 true]\n" -" [person :likes \"pizza\" 1 false]]\n" -msgstr "" - -msgid "The above datoms say:" -msgstr "" - -msgid "at time 0, `person` like pizza;" -msgstr "" - -msgid "at time 1, `person` stopped liking pizza, and started to like bread." -msgstr "" - -msgid "" -"Datomic ensures total consistency of this ever growing log by having a " -"single writer, the transactor, that will enforce it when writing." -msgstr "" - -msgid "" -"In order to support disconnected clients, I needed a way to allow multiple " -"writers, and I chose to do it by making the log not a list, but a directed " -"acyclic graph (DAG):" -msgstr "" - -msgid "" -"The extra datoms above add more information to build the directionality to " -"the log, and instead of a single consistent log, the DAG could have multiple" -" leaves that coexist, much like how different Git branches can have " -"different \"latest\" commits." -msgstr "" - -msgid "" -"In order to validate this idea, I started with a Clojure implementation. The" -" goal was not to write the actual final code, but to make a proof-of-concept" -" that would allow me to test and stretch the idea itself." -msgstr "" - -msgid "" -"This code [already " -"exists](https://euandre.org/git/mediator/tree/src/core/clojure/src/mediator.clj?id=db4a727bc24b54b50158827b34502de21dbf8948#n1)," -" but is yet fairly incomplete:" -msgstr "" - -msgid "" -"the building of the index isn't done yet (with some [commented " -"code](https://euandre.org/git/mediator/tree/src/core/clojure/src/mediator.clj?id=db4a727bc24b54b50158827b34502de21dbf8948#n295)" -" on the next step to be implemented)" -msgstr "" - -msgid "" -"the indexing is extremely inefficient, with " -"[more](https://euandre.org/git/mediator/tree/src/core/clojure/src/mediator.clj?id=db4a727bc24b54b50158827b34502de21dbf8948#n130)" -" " -"[than](https://euandre.org/git/mediator/tree/src/core/clojure/src/mediator.clj?id=db4a727bc24b54b50158827b34502de21dbf8948#n146)" -" " -"[one](https://euandre.org/git/mediator/tree/src/core/clojure/src/mediator.clj?id=db4a727bc24b54b50158827b34502de21dbf8948#n253)" -" occurrence of `O²` functions;" -msgstr "" - -msgid "no query support yet." -msgstr "" - -msgid "Top-down *and* bottom-up" -msgstr "" - -msgid "" -"However, as time passed and I started looking at what the final " -"implementation would look like, I started to consider keeping the PoC " -"around." -msgstr "" - -msgid "" -"The top-down approach (Clojure PoC) was in fact helping guide me with the " -"bottom-up, and I now have \"promoted\" the Clojure PoC into a \"reference " -"implementation\". It should now be a finished implementation that says what " -"the expected behaviour is, and the actual code should match the behaviour." -msgstr "" - -msgid "" -"The good thing about a reference implementation is that it has no " -"performance of resources boundary, so if it ends up being 1000x slower and " -"using 500× more memory, it should be find. The code can be also 10x or 100x " -"simpler, too." -msgstr "" - -msgid "Top-down: durable persistent trees" -msgstr "" - -msgid "" -"In promoting the PoC into a reference implementation, this top-down approach" -" now needs to go beyond doing everything in memory, and the index data " -"structure now needs to be disk-based." -msgstr "" - -msgid "" -"Roughly speaking, most storage engines out there are based either on B-Trees" -" or LSM Trees, or some variations of those." -msgstr "" - -msgid "" -"But when building an immutable database, update-in-place B-Trees aren't an " -"option, as it doesn't accommodate keeping historical views of the tree. LSM " -"Trees may seem a better alternative, but duplication on the files with " -"compaction are also ways to delete old data which is indeed useful for a " -"historical view." -msgstr "" - -msgid "" -"I think the thing I'm after is a mix of a Copy-on-Write B-Tree, which would " -"keep historical versions with the write IO cost amortization of memtables of" -" LSM Trees. I don't know of any B-Tree variant out there that resembles " -"this, so I'll call it \"Flushing Copy-on-Write B-Tree\"." -msgstr "" - -msgid "" -"I haven't written any code for this yet, so all I have is a high-level view " -"of what it will look like:" -msgstr "" - -msgid "" -"like Copy-on-Write B-Trees, changing a leaf involves creating a new leaf and" -" building a new path from root to the leaf. The upside is that writes a lock" -" free, and no coordination is needed between readers and writers, ever;" -msgstr "" - -msgid "" -"the downside is that a single leaf update means at least `H` new nodes that " -"will have to be flushed to disk, where `H` is the height of the tree. To " -"avoid that, the writer creates these nodes exclusively on the in-memory " -"memtable, to avoid flushing to disk on every leaf update;" -msgstr "" - -msgid "" -"a background job will consolidate the memtable data every time it hits X MB," -" and persist it to disk, amortizing the cost of the Copy-on-Write B-Tree;" -msgstr "" - -msgid "" -"readers than will have the extra job of getting the latest relevant disk-" -"resident value and merge it with the memtable data." -msgstr "" - -msgid "" -"The key difference to existing Copy-on-Write B-Trees is that the new trees " -"are only periodically written to disk, and the intermediate values are kept " -"in memory. Since no node is ever updated, the page utilization is maximum as" -" it doesn't need to keep space for future inserts and updates." -msgstr "" - -msgid "" -"And the key difference to existing LSM Trees is that no compaction is run: " -"intermediate values are still relevant as the database grows. So this leaves" -" out tombstones and value duplication done for write performance." -msgstr "" - -msgid "" -"One can delete intermediate index values to reclaim space, but no data is " -"lost on the process, only old B-Tree values. And if the database ever comes " -"back to that point (like when doing a historical query), the B-Tree will " -"have to be rebuilt from a previous value. After all, the database *is* a set" -" of datoms, and everything else is just derived data." -msgstr "" - -msgid "" -"Right now I'm still reading about other data structures that storage engines" -" use, and I'll start implementing the \"Flushing Copy-on-Write B-Tree\" as I" -" learn more[^learn-more-db] and mature it more." -msgstr "" - -msgid "" -"[^learn-more-db]: If you are interested in learning more about this too, the" -" very best two resources on this subject are Andy Pavlo's \"[Intro to " -"Database " -"Systems](https://www.youtube.com/playlist?list=PLSE8ODhjZXjbohkNBWQs_otTrBTrjyohi)\"" -" course and Alex Petrov's \"[Database " -"Internals](https://www.databass.dev/)\" book." -msgstr "" - -msgid "Bottom-up: parser combinators and FFI" -msgstr "" - -msgid "I chose Rust as it has the best WebAssembly tooling support." -msgstr "" - -msgid "" -"My goal is not to build a Rust database, but a database that happens to be " -"in Rust. In order to reach client platforms, the primary API is the FFI one." -msgstr "" - -msgid "" -"I'm not very happy with current tools for exposing Rust code via FFI to the " -"external world: they either mix C with C++, which I don't want to do, or " -"provide no access to the intermediate representation of the FFI, which would" -" be useful for generating binding for any language that speaks FFI." -msgstr "" - -msgid "" -"I like better the path that the author of " -"[cbindgen](https://github.com/eqrion/cbindgen) crate " -"[proposes](https://blog.eqrion.net/future-directions-for-cbindgen/): " -"emitting an data representation of the Rust C API (the author calls is a " -"`ffi.json` file), and than building transformers from the data " -"representation to the target language. This way you could generate a C API " -"*and* the node-ffi bindings for JavaScript automatically from the Rust code." -msgstr "" - -msgid "" -"So the first thing to be done before moving on is an FFI exporter that " -"doesn't mix C and C++, and generates said `ffi.json`, and than build a few " -"transformers that take this `ffi.json` and generate the language bindings, " -"be it C, C++, JavaScript, TypeScript, Kotlin, Swift, Dart, *etc*[^ffi-" -"langs]." -msgstr "" - -msgid "" -"[^ffi-langs]: Those are, specifically, the languages I'm more interested on." -" My goal is supporting client applications, and those languages are the most" -" relevant for doing so: C for GTK, C++ for Qt, JavaScript and TypeScript for" -" Node.js and browser, Kotlin for Android and Swing, Swift for iOS, and Dart " -"for Flutter." -msgstr "" - -msgid "" -"I think the best way to get there is by taking the existing code for " -"cbindgen, which uses the [syn](https://github.com/dtolnay/syn) crate to " -"parse the Rust code[^rust-syn], and adapt it to emit the metadata." -msgstr "" - -msgid "" -"[^rust-syn]: The fact that syn is an external crate to the Rust compiler " -"points to a big warning: procedural macros are not first class in Rust. They" -" are just like Babel plugins in JavaScript land, with the extra shortcoming " -"that there is no specification for the Rust syntax, unlike JavaScript." -msgstr "" - -msgid "" -"As flawed as this may be, it seems to be generally acceptable and adopted,\n" -"which works against building a solid ecosystem for Rust.\n" -"\n" -"The alternative that rust-ffi implements relies on internals of the Rust\n" -"compiler, which isn't actually worst, just less common and less accepted.\n" -msgstr "" - -msgid "" -"After \"finishing\" ParsecC I'll have a good notion of what a good C API is," -" and I'll have a better direction towards how to expose code from libedn to " -"other languages, and work on x-bindgen then." -msgstr "" - -msgid "" -"What both libedn and ParsecC are missing right now are proper error " -"reporting, and property-based testing for libedn." -msgstr "" - -msgid "Conclusion" -msgstr "" - -msgid "" -"I've learned a lot already, and I feel the journey I'm on is worth going " -"through." -msgstr "" - -msgid "" -"If any of those topics interest you, message me to discuss more or " -"contribute! Patches welcome!" -msgstr "" - -msgid "eu_categories: mediator" -msgstr "" - -msgid "" -"[[person :likes \"pizza\" 0 true]\n" -" [0 :parent :db/root 0 true]\n" -" [person :likes \"bread\" 1 true]\n" -" [person :likes \"pizza\" 1 false]\n" -" [1 :parent 0 1 true]]\n" -msgstr "" - -msgid "updated_at: 2021-02-09" -msgstr "" - -msgid "" -"I've started a fork of cbindgen: ~~x-bindgen~~[^x-bindgen]. Right now it is " -"just a copy of cbindgen verbatim, and I plan to remove all C and C++ " -"emitting code from it, and add a IR emitting code instead." -msgstr "" - -msgid "" -"[^x-bindgen]: *EDIT*: now archived, the experimentation was fun. I've " -"started to move more towards C, so this effort became deprecated." -msgstr "" - -msgid "" -"When starting working on x-bindgen, I realized I didn't know what to look " -"for in a header file, as I haven't written any C code in many years. So as I" -" was writing [libedn](https://euandre.org/git/libedn/), I didn't know how " -"to build a good C API to expose. So I tried porting the code to C, and right" -" now I'm working on building a *good* C API for a JSON parser using parser " -"combinators: ~~ParsecC~~ [^parsecc]." -msgstr "" - -msgid "[^parsecc]: *EDIT*: now also archived." -msgstr "" - -#~ msgid "" -#~ "I've started a fork of cbindgen: " -#~ "[x-bindgen](https://euandre.org/git/x-bindgen/). Right now it is just a " -#~ "copy of cbindgen verbatim, and I plan to remove all C and C++ emitting code " -#~ "from it, and add a IR emitting code instead." -#~ msgstr "" - -#~ msgid "" -#~ "When starting working on x-bindgen, I realized I didn't know what to look " -#~ "for in a header file, as I haven't written any C code in many years. So as I" -#~ " was writing [libedn](https://euandre.org/git/libedn/), I didn't know how " -#~ "to build a good C API to expose. So I tried porting the code to C, and right" -#~ " now I'm working on building a *good* C API for a JSON parser using parser " -#~ "combinators: ~~ParsecC~~ *EDIT*: now archived, the experimentation was fun." -#~ msgstr "" - -#~ msgid "updated_at: 2020-11-14" -#~ msgstr "" - -#~ msgid "" -#~ "When starting working on x-bindgen, I realized I didn't know what to look " -#~ "for in a header file, as I haven't written any C code in many years. So as I" -#~ " was writing [libedn](https://euandre.org/git/libedn/), I didn't know how " -#~ "to build a good C API to expose. So I tried porting the code to C, and right" -#~ " now I'm working on building a *good* C API for a JSON parser using parser " -#~ "combinators: [ParsecC](https://euandre.org/git/parsecc/)." -#~ msgstr "" - -#~ msgid "" -#~ "I've started a fork of cbindgen: " -#~ "[x-bindgen](https://euandre.org/git/x-bindgen/). Right now it is just a " -#~ "copy of cbindgen verbatim, and I plan to remove all C and C++ emitting code " -#~ "from it, and add a IR emitting code instead." -#~ msgstr "" - -#~ msgid "" -#~ "When starting working on x-bindgen, I realized I didn't know what to look " -#~ "for in a header file, as I haven't written any C code in many years. So as I" -#~ " was writing [libedn](https://euandre.org/git/libedn/), I didn't know how " -#~ "to build a good C API to expose. So I tried porting the code to C, and right" -#~ " now I'm working on building a *good* C API for a JSON parser using parser " -#~ "combinators: [ParsecC](https://euandre.org/git/parsecc/)." -#~ msgstr "" - -#~ msgid "" -#~ "When starting working on x-bindgen, I realized I didn't know what to look " -#~ "for in a header file, as I haven't written any C code in many years. So as I" -#~ " was writing [libedn](https://euandre.org/git/libedn/), I didn't know how " -#~ "to build a good C API to expose. So I tried porting the code to C, and right" -#~ " now I'm working on building a *good* C API for a JSON parser using parser " -#~ "combinators: [ParsecC](https://euandre.org/git/parsecc/)." -#~ msgstr "" - -#~ msgid "" -#~ "[[person :likes \"pizza\" 0 true]\n" -#~ " [0 :parent null 0 true]\n" -#~ " [person :likes \"bread\" 1 true]\n" -#~ " [person :likes \"pizza\" 1 false]\n" -#~ " [1 :parent 0 1 true]]\n" -#~ msgstr "" - -#~ msgid "category: mediator" -#~ msgstr "" |