diff options
Diffstat (limited to 'src/content/blog/2021')
-rw-r--r-- | src/content/blog/2021/01/26/remembering-ann.adoc | 190 | ||||
-rw-r--r-- | src/content/blog/2021/02/17/fallible.adoc | 244 | ||||
-rw-r--r-- | src/content/blog/2021/02/17/fallible.tar.gz | bin | 0 -> 3174400 bytes | |||
-rw-r--r-- | src/content/blog/2021/04/29/relational-review.adoc | 130 |
4 files changed, 564 insertions, 0 deletions
diff --git a/src/content/blog/2021/01/26/remembering-ann.adoc b/src/content/blog/2021/01/26/remembering-ann.adoc new file mode 100644 index 0000000..0d02384 --- /dev/null +++ b/src/content/blog/2021/01/26/remembering-ann.adoc @@ -0,0 +1,190 @@ +--- + +title: "ANN: remembering - Add memory to dmenu, fzf and similar tools" + +date: 2021-01-26 + +layout: post + +lang: en + +ref: ann-remembering-add-memory-to-dmenu-fzf-and-similar-tools + +--- + +Today I pushed v0.1.0 of [remembering], a tool to enhance the interactive usability of menu-like tools, such as [dmenu] and [fzf]. + +## Previous solution + +I previously used [yeganesh] to fill this gap, but as I started to rely less on Emacs, I added fzf as my go-to tool for doing fuzzy searching on the terminal. +But I didn't like that fzf always showed the same order of things, when I would only need 3 or 4 commonly used files. + +For those who don't know: yeganesh is a wrapper around dmenu that will remember your most used programs and put them on the beginning of the list of executables. +This is very convenient for interactive prolonged use, as with time the things you usually want are right at the very beginning. + +But now I had this thing, yeganesh, that solved this problem for dmenu, but didn't for fzf. + +I initially considered patching yeganesh to support it, but I found it more coupled to dmenu than I would desire. +I'd rather have something that knows nothing about dmenu, fzf or anything, but enhances tools like those in a useful way. + +[remembering]: https://euandreh.xyz/remembering/ +[dmenu]: https://tools.suckless.org/dmenu/ +[fzf]: https://github.com/junegunn/fzf +[yeganesh]: http://dmwit.com/yeganesh/ + +## Implementation + +Other than being decoupled from dmenu, another improvement I though that could be made on top of yeganesh is the programming language choice. +Instead of Haskell, I went with POSIX sh. +Sticking to POSIX sh makes it require less build-time dependencies. There aren't any, actually. Packaging is made much easier due to that. + +The good thing is that the program itself is small enough ([119 lines] on v0.1.0) that POSIX sh does the job just fine, combined with other POSIX utilities such as [getopts], [sort] and [awk]. + +[119 lines]: https://euandre.org/git/remembering/tree/remembering?id=v0.1.0 +[getopts]: http://www.opengroup.org/onlinepubs/9699919799/utilities/getopts.html +[sort]: http://www.opengroup.org/onlinepubs/9699919799/utilities/sort.html +[awk]: http://www.opengroup.org/onlinepubs/9699919799/utilities/awk.html + +The behaviour is: given a program that will read from STDIN and write a single entry to STDOUT, `remembering` wraps that program, and rearranges STDIN so that previous choices appear at the beginning. + +Where you would do: + +```shell +$ seq 5 | fzf + + 5 + 4 + 3 + 2 +> 1 + 5/5 +> +``` + +And every time get the same order of numbers, now you can write: + +```shell +$ seq 5 | remembering -p seq-fzf -c fzf + + 5 + 4 + 3 + 2 +> 1 + 5/5 +> +``` + +On the first run, everything is the same. If you picked 4 on the previous example, the following run would be different: + +```shell +$ seq 5 | remembering -p seq-fzf -c fzf + + 5 + 3 + 2 + 1 +> 4 + 5/5 +> +``` + +As time passes, the list would adjust based on the frequency of your choices. + +I aimed for reusability, so that I could wrap diverse commands with `remembering` and it would be able to work. To accomplish that, a "profile" (the `-p something` part) stores data about different runs separately. + +I took the idea of building something small with few dependencies to other places too: +- the manpages are written in troff directly; +- the tests are just more POSIX sh files; +- and a POSIX Makefile to `check` and `install`. + +I was aware of the value of sticking to coding to standards, but I had past experience mostly with programming language standards, such as ECMAScript, Common Lisp, Scheme, or with IndexedDB or DOM APIs. +It felt good to rediscover these nice POSIX tools, which makes me remember of a quote by [Henry Spencer][poor-unix]: + +> Those who do not understand Unix are condemned to reinvent it, poorly. + +[poor-unix]: https://en.wikipedia.org/wiki/Henry_Spencer#cite_note-3 + +## Usage examples + +Here are some functions I wrote myself that you may find useful: + +### Run a command with fzf on `$PWD` + +```shellcheck +f() { + profile="$f-shell-function(pwd | sed -e 's_/_-_g')" + file="$(git ls-files | \ + remembering -p "$profile" \ + -c "fzf --select-1 --exit -0 --query \"$2\" --preview 'cat {}'")" + if [ -n "$file" ]; then + # shellcheck disable=2068 + history -s f $@ + history -s "$1" "$file" + "$1" "$file" +fi +} +``` + +This way I can run `f vi` or `f vi config` at the root of a repository, and the list of files will always appear on the most used order. +Adding `pwd` to the profile allows it to not mix data for different repositories. + +### Copy password to clipboard + +```shell +choice="$(find "$HOME/.password-store" -type f | \ + grep -Ev '(.git|.gpg-id)' | \ + sed -e "s|$HOME/.password-store/||" -e 's/\.gpg$//' | \ + remembering -p password-store \ + -c 'dmenu -l 20 -i')" + + +if [ -n "$choice" ]; then + pass show "$choice" -c +fi +``` + +Adding the above to a file and binding it to a keyboard shortcut, I can access the contents of my [password store][password-store], with the entries ordered by usage. + +[password-store]: https://www.passwordstore.org/ + +### Replacing yeganesh + +Where I previously had: + +```shell +exe=$(yeganesh -x) && exec $exe +``` + +Now I have: + +```shell +exe=$(dmenu_path | remembering -p dmenu-exec -c dmenu) && exec $exe +``` + +This way, the executables appear on order of usage. + +If you don't have `dmenu_path`, you can get just the underlying `stest` tool that looks at the executables available in your `$PATH`. Here's a juicy one-liner to do it: + +```shell +$ wget -O- https://dl.suckless.org/tools/dmenu-5.0.tar.gz | \ + tar Ozxf - dmenu-5.0/arg.h dmenu-5.0/stest.c | \ + sed 's|^#include "arg.h"$|// #include "arg.h"|' | \ + cc -xc - -o stest +``` + +With the `stest` utility you'll be able to list executables in your `$PATH` and pipe them to dmenu or something else yourself: +```shell +$ (IFS=:; ./stest -flx $PATH;) | sort -u | remembering -p another-dmenu-exec -c dmenu | sh +``` + +In fact, the code for `dmenu_path` is almost just like that. + +## Conclusion + +For my personal use, I've [packaged] `remembering` for GNU Guix and Nix. Packaging it to any other distribution should be trivial, or just downloading the tarball and running `[sudo] make install`. + +Patches welcome! + +[packaged]: https://euandre.org/git/package-repository/ +[nix-file]: https://euandre.org/git/dotfiles/tree/nixos/not-on-nixpkgs/remembering.nix?id=0831444f745cf908e940407c3e00a61f6152961f diff --git a/src/content/blog/2021/02/17/fallible.adoc b/src/content/blog/2021/02/17/fallible.adoc new file mode 100644 index 0000000..8a097f8 --- /dev/null +++ b/src/content/blog/2021/02/17/fallible.adoc @@ -0,0 +1,244 @@ += ANN: fallible - Fault injection library for stress-testing failure scenarios + +date: 2021-02-17 + +updated_at: 2022-03-06 + +layout: post + +lang: en + +ref: ann-fallible-fault-injection-library-for-stress-testing-failure-scenarios + +--- + +Yesterday I pushed v0.1.0 of [fallible], a miniscule library for fault-injection +and stress-testing C programs. + +[fallible]: https://euandreh.xyz/fallible/ + +## *EDIT* + +2021-06-12: As of [0.3.0] (and beyond), the macro interface improved and is a bit different from what is presented in this article. If you're interested, I encourage you to take a look at it. + +2022-03-06: I've [archived] the project for now. It still needs some maturing before being usable. + +[0.3.0]: https://euandreh.xyz/fallible/CHANGELOG.html +[archived]: https://euandre.org/static/attachments/fallible.tar.gz + +## Existing solutions + +Writing robust code can be challenging, and tools like static analyzers, fuzzers and friends can help you get there with more certainty. +As I would try to improve some of my C code and make it more robust, in order to handle system crashes, filled disks, out-of-memory and similar scenarios, I didn't find existing tooling to help me get there as I expected to find. +I couldn't find existing tools to help me explicitly stress-test those failure scenarios. + +Take the "[Writing Robust Programs][gnu-std]" section of the GNU Coding Standards: + +[gnu-std]: https://www.gnu.org/prep/standards/standards.html#Semantics + +> Check every system call for an error return, unless you know you wish to ignore errors. +> (...) Check every call to malloc or realloc to see if it returned NULL. + +From a robustness standpoint, this is a reasonable stance: if you want to have a robust program that knows how to fail when you're out of memory and `malloc` returns `NULL`, than you ought to check every call to `malloc`. + +Take a sample code snippet for clarity: + +```c +void a_function() { + char *s1 = malloc(A_NUMBER); + strcpy(s1, "some string"); + + char *s2 = malloc(A_NUMBER); + strcpy(s2, "another string"); +} +``` + +At a first glance, this code is unsafe: if any of the calls to `malloc` returns `NULL`, `strcpy` will be given a `NULL` pointer. + +My first instinct was to change this code to something like this: + +```diff +@@ -1,7 +1,15 @@ + void a_function() { + char *s1 = malloc(A_NUMBER); ++ if (!s1) { ++ fprintf(stderr, "out of memory, exitting\n"); ++ exit(1); ++ } + strcpy(s1, "some string"); + + char *s2 = malloc(A_NUMBER); ++ if (!s2) { ++ fprintf(stderr, "out of memory, exitting\n"); ++ exit(1); ++ } + strcpy(s2, "another string"); + } +``` + +As I later found out, there are at least 2 problems with this approach: + +1. **it doesn't compose**: this could arguably work if `a_function` was `main`. + But if `a_function` lives inside a library, an `exit(1);` is a inelegant way of handling failures, and will catch the top-level `main` consuming the library by surprise; +2. **it gives up instead of handling failures**: the actual handling goes a bit beyond stopping. + What about open file handles, in-memory caches, unflushed bytes, etc.? + +If you could force only the second call to `malloc` to fail, [Valgrind] would correctly complain that the program exitted with unfreed memory. + +[Valgrind]: https://www.valgrind.org/ + +So the last change to make the best version of the above code is: + +```diff +@@ -1,15 +1,14 @@ +-void a_function() { ++bool a_function() { + char *s1 = malloc(A_NUMBER); + if (!s1) { +- fprintf(stderr, "out of memory, exitting\n"); +- exit(1); ++ return false; + } + strcpy(s1, "some string"); + + char *s2 = malloc(A_NUMBER); + if (!s2) { +- fprintf(stderr, "out of memory, exitting\n"); +- exit(1); ++ free(s1); ++ return false; + } + strcpy(s2, "another string"); + } +``` + +Instead of returning `void`, `a_function` now returns `bool` to indicate whether an error ocurred during its execution. +If `a_function` returned a pointer to something, the return value could be `NULL`, or an `int` that represents an error code. + +The code is now a) safe and b) failing gracefully, returning the control to the caller to properly handle the error case. + +After seeing similar patterns on well designed APIs, I adopted this practice for my own code, but was still left with manually verifying the correctness and robustness of it. + +How could I add assertions around my code that would help me make sure the `free(s1);` exists, before getting an error report? +How do other people and projects solve this? + +From what I could see, either people a) hope for the best, b) write safe code but don't strees-test it or c) write ad-hoc code to stress it. + +The most proeminent case of c) is SQLite: it has a few wrappers around the familiar `malloc` to do fault injection, check for memory limits, add warnings, create shim layers for other environments, etc. +All of that, however, is tightly couple with SQLite itself, and couldn't be easily pulled off for using somewhere else. + +When searching for it online, an [interesting thread] caught my atention: fail the call to `malloc` for each time it is called, and when the same stacktrace appears again, allow it to proceed. + +[interesting thread]: https://stackoverflow.com/questions/1711170/unit-testing-for-failed-malloc + +## Implementation + +A working implementation of that already exists: [mallocfail]. +It uses `LD_PRELOAD` to replace `malloc` at run-time, computes the SHA of the stacktrace and fails once for each SHA. + +I initially envisioned and started implementing something very similar to mallocfail. +However I wanted it to go beyond out-of-memory scenarios, and using `LD_PRELOAD` for every possible corner that could fail wasn't a good idea on the long run. + +Also, mallocfail won't work together with tools such as Valgrind, who want to do their own override of `malloc` with `LD_PRELOAD`. + +I instead went with less automatic things: starting with a `fallible_should_fail(char *filename, int lineno)` function that fails once for each `filename`+`lineno` combination, I created macro wrappers around common functions such as `malloc`: + +```c +void *fallible_malloc(size_t size, const char *const filename, int lineno) { +#ifdef FALLIBLE + if (fallible_should_fail(filename, lineno)) { + return NULL; + } +#else + (void)filename; + (void)lineno; +#endif + return malloc(size); +} + +#define MALLOC(size) fallible_malloc(size, __FILE__, __LINE__) +``` + +With this definition, I could replace the calls to `malloc` with `MALLOC` (or any other name that you want to `#define`): + +```diff +--- 3.c 2021-02-17 00:15:38.019706074 -0300 ++++ 4.c 2021-02-17 00:44:32.306885590 -0300 +@@ -1,11 +1,11 @@ + bool a_function() { +- char *s1 = malloc(A_NUMBER); ++ char *s1 = MALLOC(A_NUMBER); + if (!s1) { + return false; + } + strcpy(s1, "some string"); + +- char *s2 = malloc(A_NUMBER); ++ char *s2 = MALLOC(A_NUMBER); + if (!s2) { + free(s1); + return false; +``` + +With this change, if the program gets compiled with the `-DFALLIBLE` flag the fault-injection mechanism will run, and `MALLOC` will fail once for each `filename`+`lineno` combination. +When the flag is missing, `MALLOC` is a very thin wrapper around `malloc`, which compilers could remove entirely, and the `-lfallible` flags can be omitted. + +This applies not only to `malloc` or other `stdlib.h` functions. +If `a_function` is important or relevant, I could add a wrapper around it too, that checks if `fallible_should_fail` to exercise if its callers are also doing the proper clean-up. + +The actual code is just this single function, [`fallible_should_fail`], which ended-up taking only ~40 lines. +In fact, there are more lines of either Makefile (111), README.md (82) or troff (306) on this first version. + +The price for such fine-grained control is that this approach requires more manual work. + +[mallocfail]: https://github.com/ralight/mallocfail +[`fallible_should_fail`]: https://euandre.org/git/fallible/tree/src/fallible.c?id=v0.1.0#n16 + +## Usage examples + +### `MALLOC` from the `README.md` + +```c +// leaky.c +#include <string.h> +#include <fallible_alloc.h> + +int main() { + char *aaa = MALLOC(100); + if (!aaa) { + return 1; + } + strcpy(aaa, "a safe use of strcpy"); + + char *bbb = MALLOC(100); + if (!bbb) { + // free(aaa); + return 1; + } + strcpy(bbb, "not unsafe, but aaa is leaking"); + + free(bbb); + free(aaa); + return 0; +} +``` + +Compile with `-DFALLIBLE` and run [`fallible-check.1`][fallible-check]: +```shell +$ c99 -DFALLIBLE -o leaky leaky.c -lfallible +$ fallible-check ./leaky +Valgrind failed when we did not expect it to: +(...suppressed output...) +# exit status is 1 +``` + +[fallible-check]: https://euandreh.xyz/fallible/fallible-check.1.html + +## Conclusion + +For my personal use, I'll [package] them for GNU Guix and Nix. +Packaging it to any other distribution should be trivial, or just downloading the tarball and running `[sudo] make install`. + +Patches welcome! + +[package]: https://euandre.org/git/package-repository/ diff --git a/src/content/blog/2021/02/17/fallible.tar.gz b/src/content/blog/2021/02/17/fallible.tar.gz Binary files differnew file mode 100644 index 0000000..7bf2a58 --- /dev/null +++ b/src/content/blog/2021/02/17/fallible.tar.gz diff --git a/src/content/blog/2021/04/29/relational-review.adoc b/src/content/blog/2021/04/29/relational-review.adoc new file mode 100644 index 0000000..e15b478 --- /dev/null +++ b/src/content/blog/2021/04/29/relational-review.adoc @@ -0,0 +1,130 @@ +--- + +title: A Relational Model of Data for Large Shared Data Banks - article-review + +date: 2021-04-29 + +layout: post + +lang: en + +ref: a-relational-model-of-data-for-large-shared-data-banks-article-review + +--- + +This is a review of the article "[A Relational Model of Data for Large Shared Data Banks][codd-article]", by E. F. Codd. + +[codd-article]: https://www.seas.upenn.edu/~zives/03f/cis550/codd.pdf + +## Data Independence + +Codd brings the idea of *data independence* as a better approach to use on databases. +This is contrast with the existing approaches, namely hierarquical (tree-based) and network-based. + +His main argument is that queries in applications shouldn't depende and be coupled with how the data is represented internally by the database system. +This key idea is very powerful, and something that we strive for in many other places: decoupling the interface from the implementation. + +If the database system has this separation, it can kep the querying interface stable, while having the freedom to change its internal representation at will, for better performance, less storage, etc. + +This is true for most modern database systems. +They can change from B-Trees with leafs containing pointers to data, to B-Trees with leafs containing the raw data , to hash tables. +All that without changing the query interface, only its performance. + +Codd mentions that, from an information representation standpoint, any index is a duplication, but useful for perfomance. + +This data independence also impacts ordering (a *relation* doesn't rely on the insertion order). + +## Duplicates + +His definition of relational data is a bit differente from most modern database systems, namely **no duplicate rows**. + +I couldn't find a reason behind this restriction, though. +For practical purposes, I find it useful to have it. + +## Relational Data + +In the article, Codd doesn't try to define a language, and today's most popular one is SQL. + +However, there is no restriction that says that "SQL database" and "relational database" are synonyms. +One could have a relational database without using SQL at all, and it would still be a relational one. + +The main one that I have in mind, and the reason that led me to reading this paper in the first place, is Datomic. + +Is uses an [edn]-based representation for datalog queries[^edn-queries], and a particular schema used to represent data. + +Even though it looks very weird when coming from SQL, I'd argue that it ticks all the boxes (except for "no duplicates") that defines a relational database, since building relations and applying operations on them is possible. + +Compare and contrast a contrived example of possible representations of SQL and datalog of the same data: + +```sql +-- create schema +CREATE TABLE people ( + id UUID PRIMARY KEY, + name TEXT NOT NULL, + manager_id UUID, + FOREIGN KEY (manager_id) REFERENCES people (id) +); + +-- insert data +INSERT INTO people (id, name, manager_id) VALUES + ("d3f29960-ccf0-44e4-be66-1a1544677441", "Foo", "076356f4-1a0e-451c-b9c6-a6f56feec941"), + ("076356f4-1a0e-451c-b9c6-a6f56feec941", "Bar"); + +-- query data, make a relation + +SELECT employees.name AS 'employee-name', + managers.name AS 'manager-name' +FROM people employees +INNER JOIN people managers ON employees.manager_id = managers.id; +``` + +{% raw %} +``` +;; create schema +#{ {:db/ident :person/id + :db/valueType :db.type/uuid + :db/cardinality :db.cardinality/one + :db/unique :db.unique/value} + {:db/ident :person/name + :db/valueType :db.type/string + :db/cardinality :db.cardinality/one} + {:db/ident :person/manager + :db/valueType :db.type/ref + :db/cardinality :db.cardinality/one}} + +;; insert data +#{ {:person/id #uuid "d3f29960-ccf0-44e4-be66-1a1544677441" + :person/name "Foo" + :person/manager [:person/id #uuid "076356f4-1a0e-451c-b9c6-a6f56feec941"]} + {:person/id #uuid "076356f4-1a0e-451c-b9c6-a6f56feec941" + :person/name "Bar"}} + +;; query data, make a relation +{:find [?employee-name ?manager-name] + :where [[?person :person/name ?employee-name] + [?person :person/manager ?manager] + [?manager :person/name ?manager-name]]} +``` +{% endraw %} + +(forgive any errors on the above SQL and datalog code, I didn't run them to check. Patches welcome!) + +This employee example comes from the paper, and both SQL and datalog representations match the paper definition of "relational". + +Both "Foo" and "Bar" are employees, and the data is normalized. +SQL represents data as tables, and Datomic as datoms, but relations could be derived from both, which we could view as: + +``` +employee_name | manager_name +---------------------------- +"Foo" | "Bar" +``` + +[^edn-queries]: You can think of it as JSON, but with a Clojure taste. +[edn]: https://github.com/edn-format/edn + +## Conclusion + +The article also talks about operators, consistency and normalization, which are now so widespread and well-known that it feels a bit weird seeing someone advocating for it. + +I also stablish that `relational != SQL`, and other databases such as Datomic are also relational, following Codd's original definition. |