From 570ec471d1605318aeefb030cd78682ae442235b Mon Sep 17 00:00:00 2001 From: EuAndreh Date: Mon, 31 Mar 2025 21:51:40 -0300 Subject: src/content/: Update all files left to asciidoc --- src/content/blog/2021/01/26/remembering-ann.adoc | 185 ++++++++++-------- src/content/blog/2021/02/17/fallible.adoc | 216 ++++++++++++--------- src/content/blog/2021/04/29/relational-review.adoc | 126 ++++++------ 3 files changed, 302 insertions(+), 225 deletions(-) (limited to 'src/content/blog/2021') diff --git a/src/content/blog/2021/01/26/remembering-ann.adoc b/src/content/blog/2021/01/26/remembering-ann.adoc index 0d02384..5b7d2b0 100644 --- a/src/content/blog/2021/01/26/remembering-ann.adoc +++ b/src/content/blog/2021/01/26/remembering-ann.adoc @@ -1,55 +1,60 @@ ---- += ANN: remembering - Add memory to dmenu, fzf and similar tools -title: "ANN: remembering - Add memory to dmenu, fzf and similar tools" +:remembering: https://euandreh.xyz/remembering/ +:dmenu: https://tools.suckless.org/dmenu/ +:fzf: https://github.com/junegunn/fzf -date: 2021-01-26 +Today I pushed v0.1.0 of {remembering}[remembering], a tool to enhance the +interactive usability of menu-like tools, such as {dmenu}[dmenu] and {fzf}[fzf]. -layout: post +== Previous solution -lang: en +:yeganesh: https://dmwit.com/yeganesh/ -ref: ann-remembering-add-memory-to-dmenu-fzf-and-similar-tools +I previously used {yeganesh}[yeganesh] to fill this gap, but as I started to +rely less on Emacs, I added fzf as my go-to tool for doing fuzzy searching on +the terminal. But I didn't like that fzf always showed the same order of +things, when I would only need 3 or 4 commonly used files. ---- +For those who don't know: yeganesh is a wrapper around dmenu that will remember +your most used programs and put them on the beginning of the list of +executables. This is very convenient for interactive prolonged use, as with +time the things you usually want are right at the very beginning. -Today I pushed v0.1.0 of [remembering], a tool to enhance the interactive usability of menu-like tools, such as [dmenu] and [fzf]. +But now I had this thing, yeganesh, that solved this problem for dmenu, but +didn't for fzf. -## Previous solution +I initially considered patching yeganesh to support it, but I found it more +coupled to dmenu than I would desire. I'd rather have something that knows +nothing about dmenu, fzf or anything, but enhances tools like those in a useful +way. -I previously used [yeganesh] to fill this gap, but as I started to rely less on Emacs, I added fzf as my go-to tool for doing fuzzy searching on the terminal. -But I didn't like that fzf always showed the same order of things, when I would only need 3 or 4 commonly used files. +== Implementation -For those who don't know: yeganesh is a wrapper around dmenu that will remember your most used programs and put them on the beginning of the list of executables. -This is very convenient for interactive prolonged use, as with time the things you usually want are right at the very beginning. +:v-010: https://euandre.org/git/remembering/tree/remembering?id=v0.1.0 +:getopts: https://www.opengroup.org/onlinepubs/9699919799/utilities/getopts.html +:sort: https://www.opengroup.org/onlinepubs/9699919799/utilities/sort.html +:awk: https://www.opengroup.org/onlinepubs/9699919799/utilities/awk.html +:spencer-quote: https://en.wikipedia.org/wiki/Henry_Spencer#cite_note-3 -But now I had this thing, yeganesh, that solved this problem for dmenu, but didn't for fzf. +Other than being decoupled from dmenu, another improvement I though that could +be made on top of yeganesh is the programming language choice. Instead of +Haskell, I went with POSIX sh. Sticking to POSIX sh makes it require less +build-time dependencies. There aren't any, actually. Packaging is made much +easier due to that. -I initially considered patching yeganesh to support it, but I found it more coupled to dmenu than I would desire. -I'd rather have something that knows nothing about dmenu, fzf or anything, but enhances tools like those in a useful way. +The good thing is that the program itself is small enough ({v-010}[119 lines] on +v0.1.0) that POSIX sh does the job just fine, combined with other POSIX +utilities such as {getopts}[getopts], {sort}[sort] and {awk}[awk]. -[remembering]: https://euandreh.xyz/remembering/ -[dmenu]: https://tools.suckless.org/dmenu/ -[fzf]: https://github.com/junegunn/fzf -[yeganesh]: http://dmwit.com/yeganesh/ - -## Implementation - -Other than being decoupled from dmenu, another improvement I though that could be made on top of yeganesh is the programming language choice. -Instead of Haskell, I went with POSIX sh. -Sticking to POSIX sh makes it require less build-time dependencies. There aren't any, actually. Packaging is made much easier due to that. - -The good thing is that the program itself is small enough ([119 lines] on v0.1.0) that POSIX sh does the job just fine, combined with other POSIX utilities such as [getopts], [sort] and [awk]. - -[119 lines]: https://euandre.org/git/remembering/tree/remembering?id=v0.1.0 -[getopts]: http://www.opengroup.org/onlinepubs/9699919799/utilities/getopts.html -[sort]: http://www.opengroup.org/onlinepubs/9699919799/utilities/sort.html -[awk]: http://www.opengroup.org/onlinepubs/9699919799/utilities/awk.html - -The behaviour is: given a program that will read from STDIN and write a single entry to STDOUT, `remembering` wraps that program, and rearranges STDIN so that previous choices appear at the beginning. +The behaviour is: given a program that will read from STDIN and write a single +entry to STDOUT, `remembering` wraps that program, and rearranges STDIN so that +previous choices appear at the beginning. Where you would do: -```shell +[source,shell] +---- $ seq 5 | fzf 5 @@ -59,11 +64,12 @@ $ seq 5 | fzf > 1 5/5 > -``` +---- And every time get the same order of numbers, now you can write: -```shell +[source,shell] +---- $ seq 5 | remembering -p seq-fzf -c fzf 5 @@ -73,11 +79,13 @@ $ seq 5 | remembering -p seq-fzf -c fzf > 1 5/5 > -``` +---- -On the first run, everything is the same. If you picked 4 on the previous example, the following run would be different: +On the first run, everything is the same. If you picked 4 on the previous +example, the following run would be different: -```shell +[source,shell] +---- $ seq 5 | remembering -p seq-fzf -c fzf 5 @@ -87,31 +95,36 @@ $ seq 5 | remembering -p seq-fzf -c fzf > 4 5/5 > -``` +---- As time passes, the list would adjust based on the frequency of your choices. -I aimed for reusability, so that I could wrap diverse commands with `remembering` and it would be able to work. To accomplish that, a "profile" (the `-p something` part) stores data about different runs separately. - -I took the idea of building something small with few dependencies to other places too: -- the manpages are written in troff directly; -- the tests are just more POSIX sh files; -- and a POSIX Makefile to `check` and `install`. +I aimed for reusability, so that I could wrap diverse commands with +`remembering` and it would be able to work. To accomplish that, a "profile" +(the `-p something` part) stores data about different runs separately. -I was aware of the value of sticking to coding to standards, but I had past experience mostly with programming language standards, such as ECMAScript, Common Lisp, Scheme, or with IndexedDB or DOM APIs. -It felt good to rediscover these nice POSIX tools, which makes me remember of a quote by [Henry Spencer][poor-unix]: +I took the idea of building something small with few dependencies to other +places too: - the manpages are written in troff directly; - the tests are just +more POSIX sh files; - and a POSIX Makefile to `check` and `install`. -> Those who do not understand Unix are condemned to reinvent it, poorly. +I was aware of the value of sticking to coding to standards, but I had past +experience mostly with programming language standards, such as ECMAScript, +Common Lisp, Scheme, or with IndexedDB or DOM APIs. It felt good to rediscover +these nice POSIX tools, which makes me remember of a quote by +{spencer-quote}[Henry Spencer]: -[poor-unix]: https://en.wikipedia.org/wiki/Henry_Spencer#cite_note-3 +____ +Those who do not understand Unix are condemned to reinvent it, poorly. +____ -## Usage examples +== Usage examples Here are some functions I wrote myself that you may find useful: -### Run a command with fzf on `$PWD` +=== Run a command with fzf on `$PWD` -```shellcheck +[source,shellcheck] +---- f() { profile="$f-shell-function(pwd | sed -e 's_/_-_g')" file="$(git ls-files | \ @@ -124,14 +137,18 @@ f() { "$1" "$file" fi } -``` +---- -This way I can run `f vi` or `f vi config` at the root of a repository, and the list of files will always appear on the most used order. -Adding `pwd` to the profile allows it to not mix data for different repositories. +This way I can run `f vi` or `f vi config` at the root of a repository, and the +list of files will always appear on the most used order. Adding `pwd` to the +profile allows it to not mix data for different repositories. -### Copy password to clipboard +=== Copy password to clipboard -```shell +:pass: https://www.passwordstore.org/ + +[source,shell] +---- choice="$(find "$HOME/.password-store" -type f | \ grep -Ev '(.git|.gpg-id)' | \ sed -e "s|$HOME/.password-store/||" -e 's/\.gpg$//' | \ @@ -142,49 +159,57 @@ choice="$(find "$HOME/.password-store" -type f | \ if [ -n "$choice" ]; then pass show "$choice" -c fi -``` - -Adding the above to a file and binding it to a keyboard shortcut, I can access the contents of my [password store][password-store], with the entries ordered by usage. +---- -[password-store]: https://www.passwordstore.org/ +Adding the above to a file and binding it to a keyboard shortcut, I can access +the contents of my {pass}[password store], with the entries ordered by usage. -### Replacing yeganesh +=== Replacing yeganesh Where I previously had: -```shell +[source,shell] +---- exe=$(yeganesh -x) && exec $exe -``` +---- Now I have: -```shell +[source,shell] +---- exe=$(dmenu_path | remembering -p dmenu-exec -c dmenu) && exec $exe -``` +---- This way, the executables appear on order of usage. -If you don't have `dmenu_path`, you can get just the underlying `stest` tool that looks at the executables available in your `$PATH`. Here's a juicy one-liner to do it: +If you don't have `dmenu_path`, you can get just the underlying `stest` tool +that looks at the executables available in your `$PATH`. Here's a juicy +one-liner to do it: -```shell +[source,shell] +---- $ wget -O- https://dl.suckless.org/tools/dmenu-5.0.tar.gz | \ tar Ozxf - dmenu-5.0/arg.h dmenu-5.0/stest.c | \ sed 's|^#include "arg.h"$|// #include "arg.h"|' | \ cc -xc - -o stest -``` +---- + +With the `stest` utility you'll be able to list executables in your `$PATH` and +pipe them to dmenu or something else yourself: -With the `stest` utility you'll be able to list executables in your `$PATH` and pipe them to dmenu or something else yourself: -```shell +[source,shell] +---- $ (IFS=:; ./stest -flx $PATH;) | sort -u | remembering -p another-dmenu-exec -c dmenu | sh -``` +---- In fact, the code for `dmenu_path` is almost just like that. -## Conclusion +== Conclusion -For my personal use, I've [packaged] `remembering` for GNU Guix and Nix. Packaging it to any other distribution should be trivial, or just downloading the tarball and running `[sudo] make install`. +:packaged: https://euandre.org/git/package-repository/ -Patches welcome! +For my personal use, I've {packaged}[packaged] `remembering` for GNU Guix and +Nix. Packaging it to any other distribution should be trivial, or just +downloading the tarball and running `[sudo] make install`. -[packaged]: https://euandre.org/git/package-repository/ -[nix-file]: https://euandre.org/git/dotfiles/tree/nixos/not-on-nixpkgs/remembering.nix?id=0831444f745cf908e940407c3e00a61f6152961f +Patches welcome! diff --git a/src/content/blog/2021/02/17/fallible.adoc b/src/content/blog/2021/02/17/fallible.adoc index 8a097f8..533e107 100644 --- a/src/content/blog/2021/02/17/fallible.adoc +++ b/src/content/blog/2021/02/17/fallible.adoc @@ -1,49 +1,51 @@ = ANN: fallible - Fault injection library for stress-testing failure scenarios -date: 2021-02-17 +:fallible: https://euandreh.xyz/fallible/ -updated_at: 2022-03-06 +Yesterday I pushed v0.1.0 of {fallible}[fallible], a miniscule library for +fault-injection and stress-testing C programs. -layout: post +== _EDIT_ -lang: en +:changelog: https://euandreh.xyz/fallible/CHANGELOG.html +:tarball: https://euandre.org/static/attachments/fallible.tar.gz -ref: ann-fallible-fault-injection-library-for-stress-testing-failure-scenarios +2021-06-12: As of {changelog}[0.3.0] (and beyond), the macro interface improved +and is a bit different from what is presented in this article. If you're +interested, I encourage you to take a look at it. ---- +2022-03-06: I've {tarball}[archived] the project for now. It still needs some +maturing before being usable. -Yesterday I pushed v0.1.0 of [fallible], a miniscule library for fault-injection -and stress-testing C programs. +== Existing solutions -[fallible]: https://euandreh.xyz/fallible/ +:gnu-std: https://www.gnu.org/prep/standards/standards.html#Semantics +:valgrind: https://www.valgrind.org/ +:so-alloc: https://stackoverflow.com/questions/1711170/unit-testing-for-failed-malloc -## *EDIT* +Writing robust code can be challenging, and tools like static analyzers, fuzzers +and friends can help you get there with more certainty. As I would try to +improve some of my C code and make it more robust, in order to handle system +crashes, filled disks, out-of-memory and similar scenarios, I didn't find +existing tooling to help me get there as I expected to find. I couldn't find +existing tools to help me explicitly stress-test those failure scenarios. -2021-06-12: As of [0.3.0] (and beyond), the macro interface improved and is a bit different from what is presented in this article. If you're interested, I encourage you to take a look at it. +Take the "{gnu-std}[Writing Robust Programs]" section of the GNU Coding +Standards: -2022-03-06: I've [archived] the project for now. It still needs some maturing before being usable. +____ +Check every system call for an error return, unless you know you wish to ignore +errors. (...) Check every call to malloc or realloc to see if it returned NULL. +____ -[0.3.0]: https://euandreh.xyz/fallible/CHANGELOG.html -[archived]: https://euandre.org/static/attachments/fallible.tar.gz - -## Existing solutions - -Writing robust code can be challenging, and tools like static analyzers, fuzzers and friends can help you get there with more certainty. -As I would try to improve some of my C code and make it more robust, in order to handle system crashes, filled disks, out-of-memory and similar scenarios, I didn't find existing tooling to help me get there as I expected to find. -I couldn't find existing tools to help me explicitly stress-test those failure scenarios. - -Take the "[Writing Robust Programs][gnu-std]" section of the GNU Coding Standards: - -[gnu-std]: https://www.gnu.org/prep/standards/standards.html#Semantics - -> Check every system call for an error return, unless you know you wish to ignore errors. -> (...) Check every call to malloc or realloc to see if it returned NULL. - -From a robustness standpoint, this is a reasonable stance: if you want to have a robust program that knows how to fail when you're out of memory and `malloc` returns `NULL`, than you ought to check every call to `malloc`. +From a robustness standpoint, this is a reasonable stance: if you want to have a +robust program that knows how to fail when you're out of memory and `malloc` +returns `NULL`, than you ought to check every call to `malloc`. Take a sample code snippet for clarity: -```c +[source,c] +---- void a_function() { char *s1 = malloc(A_NUMBER); strcpy(s1, "some string"); @@ -51,13 +53,15 @@ void a_function() { char *s2 = malloc(A_NUMBER); strcpy(s2, "another string"); } -``` +---- -At a first glance, this code is unsafe: if any of the calls to `malloc` returns `NULL`, `strcpy` will be given a `NULL` pointer. +At a first glance, this code is unsafe: if any of the calls to `malloc` returns +`NULL`, `strcpy` will be given a `NULL` pointer. My first instinct was to change this code to something like this: -```diff +[source,diff] +---- @@ -1,7 +1,15 @@ void a_function() { char *s1 = malloc(A_NUMBER); @@ -74,22 +78,26 @@ My first instinct was to change this code to something like this: + } strcpy(s2, "another string"); } -``` +---- As I later found out, there are at least 2 problems with this approach: -1. **it doesn't compose**: this could arguably work if `a_function` was `main`. - But if `a_function` lives inside a library, an `exit(1);` is a inelegant way of handling failures, and will catch the top-level `main` consuming the library by surprise; -2. **it gives up instead of handling failures**: the actual handling goes a bit beyond stopping. - What about open file handles, in-memory caches, unflushed bytes, etc.? - -If you could force only the second call to `malloc` to fail, [Valgrind] would correctly complain that the program exitted with unfreed memory. +. *it doesn't compose*: this could arguably work if `a_function` was `main`. + But if `a_function` lives inside a library, an `exit(1);` is an inelegant way + of handling failures, and will catch the top-level `main` consuming the + library by surprise; +. *it gives up instead of handling failures*: the actual handling goes a bit + beyond stopping. What about open file handles, in-memory caches, unflushed + bytes, etc.? -[Valgrind]: https://www.valgrind.org/ +If you could force only the second call to `malloc` to fail, +{valgrind}[Valgrind] would correctly complain that the program exitted with +unfreed memory. So the last change to make the best version of the above code is: -```diff +[source,diff] +---- @@ -1,15 +1,14 @@ -void a_function() { +bool a_function() { @@ -110,40 +118,61 @@ So the last change to make the best version of the above code is: } strcpy(s2, "another string"); } -``` +---- -Instead of returning `void`, `a_function` now returns `bool` to indicate whether an error ocurred during its execution. -If `a_function` returned a pointer to something, the return value could be `NULL`, or an `int` that represents an error code. +Instead of returning `void`, `a_function` now returns `bool` to indicate whether +an error ocurred during its execution. If `a_function` returned a pointer to +something, the return value could be `NULL`, or an `int` that represents an +error code. -The code is now a) safe and b) failing gracefully, returning the control to the caller to properly handle the error case. +The code is now a) safe and b) failing gracefully, returning the control to the +caller to properly handle the error case. -After seeing similar patterns on well designed APIs, I adopted this practice for my own code, but was still left with manually verifying the correctness and robustness of it. +After seeing similar patterns on well designed APIs, I adopted this practice for +my own code, but was still left with manually verifying the correctness and +robustness of it. -How could I add assertions around my code that would help me make sure the `free(s1);` exists, before getting an error report? -How do other people and projects solve this? +How could I add assertions around my code that would help me make sure the +`free(s1);` exists, before getting an error report? How do other people and +projects solve this? -From what I could see, either people a) hope for the best, b) write safe code but don't strees-test it or c) write ad-hoc code to stress it. +From what I could see, either people a) hope for the best, b) write safe code +but don't strees-test it or c) write ad-hoc code to stress it. -The most proeminent case of c) is SQLite: it has a few wrappers around the familiar `malloc` to do fault injection, check for memory limits, add warnings, create shim layers for other environments, etc. -All of that, however, is tightly couple with SQLite itself, and couldn't be easily pulled off for using somewhere else. +The most proeminent case of c) is SQLite: it has a few wrappers around the +familiar `malloc` to do fault injection, check for memory limits, add warnings, +create shim layers for other environments, etc. All of that, however, is +tightly couple with SQLite itself, and couldn't be easily pulled off for using +somewhere else. -When searching for it online, an [interesting thread] caught my atention: fail the call to `malloc` for each time it is called, and when the same stacktrace appears again, allow it to proceed. +When searching for it online, an {so-alloc}[interesting thread] caught my +atention: fail the call to `malloc` for each time it is called, and when the +same stacktrace appears again, allow it to proceed. -[interesting thread]: https://stackoverflow.com/questions/1711170/unit-testing-for-failed-malloc +== Implementation -## Implementation +:mallocfail: https://github.com/ralight/mallocfail +:should-fail-fn: https://euandre.org/git/fallible/tree/src/fallible.c?id=v0.1.0#n16 -A working implementation of that already exists: [mallocfail]. -It uses `LD_PRELOAD` to replace `malloc` at run-time, computes the SHA of the stacktrace and fails once for each SHA. +A working implementation of that already exists: {mallocfail}[mallocfail]. It +uses `LD_PRELOAD` to replace `malloc` at run-time, computes the SHA of the +stacktrace and fails once for each SHA. -I initially envisioned and started implementing something very similar to mallocfail. -However I wanted it to go beyond out-of-memory scenarios, and using `LD_PRELOAD` for every possible corner that could fail wasn't a good idea on the long run. +I initially envisioned and started implementing something very similar to +mallocfail. However I wanted it to go beyond out-of-memory scenarios, and using +`LD_PRELOAD` for every possible corner that could fail wasn't a good idea on the +long run. -Also, mallocfail won't work together with tools such as Valgrind, who want to do their own override of `malloc` with `LD_PRELOAD`. +Also, mallocfail won't work together with tools such as Valgrind, who want to do +their own override of `malloc` with `LD_PRELOAD`. -I instead went with less automatic things: starting with a `fallible_should_fail(char *filename, int lineno)` function that fails once for each `filename`+`lineno` combination, I created macro wrappers around common functions such as `malloc`: +I instead went with less automatic things: starting with a +`fallible_should_fail(char *filename, int lineno)` function that fails once for +each `filename`+`lineno` combination, I created macro wrappers around common +functions such as `malloc`: -```c +[source,c] +---- void *fallible_malloc(size_t size, const char *const filename, int lineno) { #ifdef FALLIBLE if (fallible_should_fail(filename, lineno)) { @@ -157,11 +186,13 @@ void *fallible_malloc(size_t size, const char *const filename, int lineno) { } #define MALLOC(size) fallible_malloc(size, __FILE__, __LINE__) -``` +---- -With this definition, I could replace the calls to `malloc` with `MALLOC` (or any other name that you want to `#define`): +With this definition, I could replace the calls to `malloc` with `MALLOC` (or +any other name that you want to `#define`): -```diff +[source,diff] +---- --- 3.c 2021-02-17 00:15:38.019706074 -0300 +++ 4.c 2021-02-17 00:44:32.306885590 -0300 @@ -1,11 +1,11 @@ @@ -178,27 +209,35 @@ With this definition, I could replace the calls to `malloc` with `MALLOC` (or an if (!s2) { free(s1); return false; -``` +---- -With this change, if the program gets compiled with the `-DFALLIBLE` flag the fault-injection mechanism will run, and `MALLOC` will fail once for each `filename`+`lineno` combination. -When the flag is missing, `MALLOC` is a very thin wrapper around `malloc`, which compilers could remove entirely, and the `-lfallible` flags can be omitted. +With this change, if the program gets compiled with the `-DFALLIBLE` flag the +fault-injection mechanism will run, and `MALLOC` will fail once for each +`filename`+`lineno` combination. When the flag is missing, `MALLOC` is a very +thin wrapper around `malloc`, which compilers could remove entirely, and the +`-lfallible` flags can be omitted. -This applies not only to `malloc` or other `stdlib.h` functions. -If `a_function` is important or relevant, I could add a wrapper around it too, that checks if `fallible_should_fail` to exercise if its callers are also doing the proper clean-up. +This applies not only to `malloc` or other `stdlib.h` functions. If +`a_function` is important or relevant, I could add a wrapper around it too, that +checks if `fallible_should_fail` to exercise if its callers are also doing the +proper clean-up. -The actual code is just this single function, [`fallible_should_fail`], which ended-up taking only ~40 lines. -In fact, there are more lines of either Makefile (111), README.md (82) or troff (306) on this first version. +The actual code is just this single function, +{should-fail-fn}[`fallible_should_fail`], which ended-up taking only ~40 lines. +In fact, there are more lines of either Makefile (111), README.md (82) or troff +(306) on this first version. -The price for such fine-grained control is that this approach requires more manual work. +The price for such fine-grained control is that this approach requires more +manual work. -[mallocfail]: https://github.com/ralight/mallocfail -[`fallible_should_fail`]: https://euandre.org/git/fallible/tree/src/fallible.c?id=v0.1.0#n16 +== Usage examples -## Usage examples +=== `MALLOC` from the `README.md` -### `MALLOC` from the `README.md` +:fallible-check: https://euandreh.xyz/fallible/fallible-check.1.html -```c +[source,c] +---- // leaky.c #include #include @@ -221,24 +260,25 @@ int main() { free(aaa); return 0; } -``` +---- -Compile with `-DFALLIBLE` and run [`fallible-check.1`][fallible-check]: -```shell +Compile with `-DFALLIBLE` and run {fallible-check}[`fallible-check.1`]: + +[source,shell] +---- $ c99 -DFALLIBLE -o leaky leaky.c -lfallible $ fallible-check ./leaky Valgrind failed when we did not expect it to: (...suppressed output...) # exit status is 1 -``` +---- -[fallible-check]: https://euandreh.xyz/fallible/fallible-check.1.html +== Conclusion -## Conclusion +:package: https://euandre.org/git/package-repository/ -For my personal use, I'll [package] them for GNU Guix and Nix. -Packaging it to any other distribution should be trivial, or just downloading the tarball and running `[sudo] make install`. +For my personal use, I'll {package}[package] them for GNU Guix and Nix. +Packaging it to any other distribution should be trivial, or just downloading +the tarball and running `[sudo] make install`. Patches welcome! - -[package]: https://euandre.org/git/package-repository/ diff --git a/src/content/blog/2021/04/29/relational-review.adoc b/src/content/blog/2021/04/29/relational-review.adoc index e15b478..cb552c3 100644 --- a/src/content/blog/2021/04/29/relational-review.adoc +++ b/src/content/blog/2021/04/29/relational-review.adoc @@ -1,62 +1,73 @@ ---- += A Relational Model of Data for Large Shared Data Banks - article-review -title: A Relational Model of Data for Large Shared Data Banks - article-review +:empty: +:reviewed-article: https://www.seas.upenn.edu/~zives/03f/cis550/codd.pdf -date: 2021-04-29 +This is a review of the article "{reviewed-article}[A Relational Model of Data +for Large Shared Data Banks]", by E. F. Codd. -layout: post +== Data Independence -lang: en +Codd brings the idea of _data independence_ as a better approach to use on +databases. This is contrast with the existing approaches, namely hierarquical +(tree-based) and network-based. -ref: a-relational-model-of-data-for-large-shared-data-banks-article-review +His main argument is that queries in applications shouldn't depende and be +coupled with how the data is represented internally by the database system. +This key idea is very powerful, and something that we strive for in many other +places: decoupling the interface from the implementation. ---- +If the database system has this separation, it can kep the querying interface +stable, while having the freedom to change its internal representation at will, +for better performance, less storage, etc. -This is a review of the article "[A Relational Model of Data for Large Shared Data Banks][codd-article]", by E. F. Codd. +This is true for most modern database systems. They can change from B-Trees +with leafs containing pointers to data, to B-Trees with leafs containing the raw +data , to hash tables. All that without changing the query interface, only its +performance. -[codd-article]: https://www.seas.upenn.edu/~zives/03f/cis550/codd.pdf +Codd mentions that, from an information representation standpoint, any index is +a duplication, but useful for perfomance. -## Data Independence +This data independence also impacts ordering (a _relation_ doesn't rely on the +insertion order). -Codd brings the idea of *data independence* as a better approach to use on databases. -This is contrast with the existing approaches, namely hierarquical (tree-based) and network-based. +== Duplicates -His main argument is that queries in applications shouldn't depende and be coupled with how the data is represented internally by the database system. -This key idea is very powerful, and something that we strive for in many other places: decoupling the interface from the implementation. +His definition of relational data is a bit differente from most modern database +systems, namely *no duplicate rows*. -If the database system has this separation, it can kep the querying interface stable, while having the freedom to change its internal representation at will, for better performance, less storage, etc. +I couldn't find a reason behind this restriction, though. For practical +purposes, I find it useful to have it. -This is true for most modern database systems. -They can change from B-Trees with leafs containing pointers to data, to B-Trees with leafs containing the raw data , to hash tables. -All that without changing the query interface, only its performance. +== Relational Data -Codd mentions that, from an information representation standpoint, any index is a duplication, but useful for perfomance. +:edn: https://github.com/edn-format/edn -This data independence also impacts ordering (a *relation* doesn't rely on the insertion order). +In the article, Codd doesn't try to define a language, and today's most popular +one is SQL. -## Duplicates +However, there is no restriction that says that "SQL database" and "relational +database" are synonyms. One could have a relational database without using SQL +at all, and it would still be a relational one. -His definition of relational data is a bit differente from most modern database systems, namely **no duplicate rows**. +The main one that I have in mind, and the reason that led me to reading this +paper in the first place, is Datomic. -I couldn't find a reason behind this restriction, though. -For practical purposes, I find it useful to have it. +Is uses an {edn}[edn]-based representation for datalog +queries{empty}footnote:edn-queries[ + You can think of it as JSON, but with a Clojure taste. +], and a particular schema used to represent data. -## Relational Data +Even though it looks very weird when coming from SQL, I'd argue that it ticks +all the boxes (except for "no duplicates") that defines a relational database, +since building relations and applying operations on them is possible. -In the article, Codd doesn't try to define a language, and today's most popular one is SQL. +Compare and contrast a contrived example of possible representations of SQL and +datalog of the same data: -However, there is no restriction that says that "SQL database" and "relational database" are synonyms. -One could have a relational database without using SQL at all, and it would still be a relational one. - -The main one that I have in mind, and the reason that led me to reading this paper in the first place, is Datomic. - -Is uses an [edn]-based representation for datalog queries[^edn-queries], and a particular schema used to represent data. - -Even though it looks very weird when coming from SQL, I'd argue that it ticks all the boxes (except for "no duplicates") that defines a relational database, since building relations and applying operations on them is possible. - -Compare and contrast a contrived example of possible representations of SQL and datalog of the same data: - -```sql +[source,sql] +---- -- create schema CREATE TABLE people ( id UUID PRIMARY KEY, @@ -76,12 +87,11 @@ SELECT employees.name AS 'employee-name', managers.name AS 'manager-name' FROM people employees INNER JOIN people managers ON employees.manager_id = managers.id; -``` +---- -{% raw %} -``` +---- ;; create schema -#{ {:db/ident :person/id +#{{:db/ident :person/id :db/valueType :db.type/uuid :db/cardinality :db.cardinality/one :db/unique :db.unique/value} @@ -93,7 +103,7 @@ INNER JOIN people managers ON employees.manager_id = managers.id; :db/cardinality :db.cardinality/one}} ;; insert data -#{ {:person/id #uuid "d3f29960-ccf0-44e4-be66-1a1544677441" +#{{:person/id #uuid "d3f29960-ccf0-44e4-be66-1a1544677441" :person/name "Foo" :person/manager [:person/id #uuid "076356f4-1a0e-451c-b9c6-a6f56feec941"]} {:person/id #uuid "076356f4-1a0e-451c-b9c6-a6f56feec941" @@ -104,27 +114,29 @@ INNER JOIN people managers ON employees.manager_id = managers.id; :where [[?person :person/name ?employee-name] [?person :person/manager ?manager] [?manager :person/name ?manager-name]]} -``` -{% endraw %} +---- -(forgive any errors on the above SQL and datalog code, I didn't run them to check. Patches welcome!) +(forgive any errors on the above SQL and datalog code, I didn't run them to +check. Patches welcome!) -This employee example comes from the paper, and both SQL and datalog representations match the paper definition of "relational". +This employee example comes from the paper, and both SQL and datalog +representations match the paper definition of "relational". -Both "Foo" and "Bar" are employees, and the data is normalized. -SQL represents data as tables, and Datomic as datoms, but relations could be derived from both, which we could view as: +Both "Foo" and "Bar" are employees, and the data is normalized. SQL represents +data as tables, and Datomic as datoms, but relations could be derived from both, +which we could view as: -``` +.... employee_name | manager_name ---------------------------- "Foo" | "Bar" -``` - -[^edn-queries]: You can think of it as JSON, but with a Clojure taste. -[edn]: https://github.com/edn-format/edn +.... -## Conclusion +== Conclusion -The article also talks about operators, consistency and normalization, which are now so widespread and well-known that it feels a bit weird seeing someone advocating for it. +The article also talks about operators, consistency and normalization, which are +now so widespread and well-known that it feels a bit weird seeing someone +advocating for it. -I also stablish that `relational != SQL`, and other databases such as Datomic are also relational, following Codd's original definition. +I also stablish that `relational != SQL`, and other databases such as Datomic +are also relational, following Codd's original definition. -- cgit v1.2.3