diff options
Diffstat (limited to 'src/content/blog/2021')
-rw-r--r-- | src/content/blog/2021/01/26/remembering-ann.adoc | 216 | ||||
-rw-r--r-- | src/content/blog/2021/02/17/fallible.adoc | 285 | ||||
-rw-r--r-- | src/content/blog/2021/02/17/fallible.tar.gz | bin | 1915439 -> 0 bytes | |||
-rw-r--r-- | src/content/blog/2021/04/29/relational-review.adoc | 144 |
4 files changed, 0 insertions, 645 deletions
diff --git a/src/content/blog/2021/01/26/remembering-ann.adoc b/src/content/blog/2021/01/26/remembering-ann.adoc deleted file mode 100644 index 6786b3c..0000000 --- a/src/content/blog/2021/01/26/remembering-ann.adoc +++ /dev/null @@ -1,216 +0,0 @@ -= ANN: remembering - Add memory to dmenu, fzf and similar tools -:categories: ann - -:remembering: https://euandreh.xyz/remembering/ -:dmenu: https://tools.suckless.org/dmenu/ -:fzf: https://github.com/junegunn/fzf - -Today I pushed v0.1.0 of {remembering}[remembering], a tool to enhance the -interactive usability of menu-like tools, such as {dmenu}[dmenu] and {fzf}[fzf]. - -== Previous solution - -:yeganesh: https://dmwit.com/yeganesh/ - -I previously used {yeganesh}[yeganesh] to fill this gap, but as I started to -rely less on Emacs, I added fzf as my go-to tool for doing fuzzy searching on -the terminal. But I didn't like that fzf always showed the same order of -things, when I would only need 3 or 4 commonly used files. - -For those who don't know: yeganesh is a wrapper around dmenu that will remember -your most used programs and put them on the beginning of the list of -executables. This is very convenient for interactive prolonged use, as with -time the things you usually want are right at the very beginning. - -But now I had this thing, yeganesh, that solved this problem for dmenu, but -didn't for fzf. - -I initially considered patching yeganesh to support it, but I found it more -coupled to dmenu than I would desire. I'd rather have something that knows -nothing about dmenu, fzf or anything, but enhances tools like those in a useful -way. - -== Implementation - -:v-010: https://euandre.org/git/remembering/tree/remembering?id=v0.1.0 -:getopts: https://www.opengroup.org/onlinepubs/9699919799/utilities/getopts.html -:sort: https://www.opengroup.org/onlinepubs/9699919799/utilities/sort.html -:awk: https://www.opengroup.org/onlinepubs/9699919799/utilities/awk.html -:spencer-quote: https://en.wikipedia.org/wiki/Henry_Spencer#cite_note-3 - -Other than being decoupled from dmenu, another improvement I though that could -be made on top of yeganesh is the programming language choice. Instead of -Haskell, I went with POSIX sh. Sticking to POSIX sh makes it require less -build-time dependencies. There aren't any, actually. Packaging is made much -easier due to that. - -The good thing is that the program itself is small enough ({v-010}[119 lines] on -v0.1.0) that POSIX sh does the job just fine, combined with other POSIX -utilities such as {getopts}[getopts], {sort}[sort] and {awk}[awk]. - -The behaviour is: given a program that will read from STDIN and write a single -entry to STDOUT, `remembering` wraps that program, and rearranges STDIN so that -previous choices appear at the beginning. - -Where you would do: - -[source,sh] ----- -$ seq 5 | fzf - - 5 - 4 - 3 - 2 -> 1 - 5/5 -> ----- - -And every time get the same order of numbers, now you can write: - -[source,sh] ----- -$ seq 5 | remembering -p seq-fzf -c fzf - - 5 - 4 - 3 - 2 -> 1 - 5/5 -> ----- - -On the first run, everything is the same. If you picked 4 on the previous -example, the following run would be different: - -[source,sh] ----- -$ seq 5 | remembering -p seq-fzf -c fzf - - 5 - 3 - 2 - 1 -> 4 - 5/5 -> ----- - -As time passes, the list would adjust based on the frequency of your choices. - -I aimed for reusability, so that I could wrap diverse commands with -`remembering` and it would be able to work. To accomplish that, a "profile" -(the `-p something` part) stores data about different runs separately. - -I took the idea of building something small with few dependencies to other -places too: - the manpages are written in troff directly; - the tests are just -more POSIX sh files; - and a POSIX Makefile to `check` and `install`. - -I was aware of the value of sticking to coding to standards, but I had past -experience mostly with programming language standards, such as ECMAScript, -Common Lisp, Scheme, or with IndexedDB or DOM APIs. It felt good to rediscover -these nice POSIX tools, which makes me remember of a quote by -{spencer-quote}[Henry Spencer]: - -____ -Those who do not understand Unix are condemned to reinvent it, poorly. -____ - -== Usage examples - -Here are some functions I wrote myself that you may find useful: - -=== Run a command with fzf on `$PWD` - -[source,sh] ----- -f() { - profile="$f-shell-function(pwd | sed -e 's_/_-_g')" - file="$(git ls-files | \ - remembering -p "$profile" \ - -c "fzf --select-1 --exit -0 --query \"$2\" --preview 'cat {}'")" - if [ -n "$file" ]; then - # shellcheck disable=2068 - history -s f $@ - history -s "$1" "$file" - "$1" "$file" -fi -} ----- - -This way I can run `f vi` or `f vi config` at the root of a repository, and the -list of files will always appear on the most used order. Adding `pwd` to the -profile allows it to not mix data for different repositories. - -=== Copy password to clipboard - -:pass: https://www.passwordstore.org/ - -[source,sh] ----- -choice="$(find "$HOME/.password-store" -type f | \ - grep -Ev '(.git|.gpg-id)' | \ - sed -e "s|$HOME/.password-store/||" -e 's/\.gpg$//' | \ - remembering -p password-store \ - -c 'dmenu -l 20 -i')" - - -if [ -n "$choice" ]; then - pass show "$choice" -c -fi ----- - -Adding the above to a file and binding it to a keyboard shortcut, I can access -the contents of my {pass}[password store], with the entries ordered by usage. - -=== Replacing yeganesh - -Where I previously had: - -[source,sh] ----- -exe=$(yeganesh -x) && exec $exe ----- - -Now I have: - -[source,sh] ----- -exe=$(dmenu_path | remembering -p dmenu-exec -c dmenu) && exec $exe ----- - -This way, the executables appear on order of usage. - -If you don't have `dmenu_path`, you can get just the underlying `stest` tool -that looks at the executables available in your `$PATH`. Here's a juicy -one-liner to do it: - -[source,sh] ----- -$ wget -O- https://dl.suckless.org/tools/dmenu-5.0.tar.gz | \ - tar Ozxf - dmenu-5.0/arg.h dmenu-5.0/stest.c | \ - sed 's|^#include "arg.h"$|// #include "arg.h"|' | \ - cc -xc - -o stest ----- - -With the `stest` utility you'll be able to list executables in your `$PATH` and -pipe them to dmenu or something else yourself: - -[source,sh] ----- -$ (IFS=:; ./stest -flx $PATH;) | sort -u | remembering -p another-dmenu-exec -c dmenu | sh ----- - -In fact, the code for `dmenu_path` is almost just like that. - -== Conclusion - -:packaged: https://euandre.org/git/package-repository/ - -For my personal use, I've {packaged}[packaged] `remembering` for GNU Guix and -Nix. Packaging it to any other distribution should be trivial, or just -downloading the tarball and running `[sudo] make install`. - -Patches welcome! diff --git a/src/content/blog/2021/02/17/fallible.adoc b/src/content/blog/2021/02/17/fallible.adoc deleted file mode 100644 index 1f2f641..0000000 --- a/src/content/blog/2021/02/17/fallible.adoc +++ /dev/null @@ -1,285 +0,0 @@ -= ANN: fallible - Fault injection library for stress-testing failure scenarios -:updatedat: 2022-03-06 - -:fallible: https://euandreh.xyz/fallible/ - -Yesterday I pushed v0.1.0 of {fallible}[fallible], a miniscule library for -fault-injection and stress-testing C programs. - -== _EDIT_ - -:changelog: https://euandreh.xyz/fallible/CHANGELOG.html -:tarball: https://euandre.org/static/attachments/fallible.tar.gz - -2021-06-12: As of {changelog}[0.3.0] (and beyond), the macro interface improved -and is a bit different from what is presented in this article. If you're -interested, I encourage you to take a look at it. - -2022-03-06: I've {tarball}[archived] the project for now. It still needs some -maturing before being usable. - -== Existing solutions - -:gnu-std: https://www.gnu.org/prep/standards/standards.html#Semantics -:valgrind: https://www.valgrind.org/ -:so-alloc: https://stackoverflow.com/questions/1711170/unit-testing-for-failed-malloc - -Writing robust code can be challenging, and tools like static analyzers, fuzzers -and friends can help you get there with more certainty. As I would try to -improve some of my C code and make it more robust, in order to handle system -crashes, filled disks, out-of-memory and similar scenarios, I didn't find -existing tooling to help me get there as I expected to find. I couldn't find -existing tools to help me explicitly stress-test those failure scenarios. - -Take the "{gnu-std}[Writing Robust Programs]" section of the GNU Coding -Standards: - -____ -Check every system call for an error return, unless you know you wish to ignore -errors. (...) Check every call to malloc or realloc to see if it returned NULL. -____ - -From a robustness standpoint, this is a reasonable stance: if you want to have a -robust program that knows how to fail when you're out of memory and `malloc` -returns `NULL`, than you ought to check every call to `malloc`. - -Take a sample code snippet for clarity: - -[source,c] ----- -void a_function() { - char *s1 = malloc(A_NUMBER); - strcpy(s1, "some string"); - - char *s2 = malloc(A_NUMBER); - strcpy(s2, "another string"); -} ----- - -At a first glance, this code is unsafe: if any of the calls to `malloc` returns -`NULL`, `strcpy` will be given a `NULL` pointer. - -My first instinct was to change this code to something like this: - -[source,diff] ----- -@@ -1,7 +1,15 @@ - void a_function() { - char *s1 = malloc(A_NUMBER); -+ if (!s1) { -+ fprintf(stderr, "out of memory, exitting\n"); -+ exit(1); -+ } - strcpy(s1, "some string"); - - char *s2 = malloc(A_NUMBER); -+ if (!s2) { -+ fprintf(stderr, "out of memory, exitting\n"); -+ exit(1); -+ } - strcpy(s2, "another string"); - } ----- - -As I later found out, there are at least 2 problems with this approach: - -. *it doesn't compose*: this could arguably work if `a_function` was `main`. - But if `a_function` lives inside a library, an `exit(1);` is an inelegant way - of handling failures, and will catch the top-level `main` consuming the - library by surprise; -. *it gives up instead of handling failures*: the actual handling goes a bit - beyond stopping. What about open file handles, in-memory caches, unflushed - bytes, etc.? - -If you could force only the second call to `malloc` to fail, -{valgrind}[Valgrind] would correctly complain that the program exitted with -unfreed memory. - -So the last change to make the best version of the above code is: - -[source,diff] ----- -@@ -1,15 +1,14 @@ --void a_function() { -+bool a_function() { - char *s1 = malloc(A_NUMBER); - if (!s1) { -- fprintf(stderr, "out of memory, exitting\n"); -- exit(1); -+ return false; - } - strcpy(s1, "some string"); - - char *s2 = malloc(A_NUMBER); - if (!s2) { -- fprintf(stderr, "out of memory, exitting\n"); -- exit(1); -+ free(s1); -+ return false; - } - strcpy(s2, "another string"); - } ----- - -Instead of returning `void`, `a_function` now returns `bool` to indicate whether -an error ocurred during its execution. If `a_function` returned a pointer to -something, the return value could be `NULL`, or an `int` that represents an -error code. - -The code is now a) safe and b) failing gracefully, returning the control to the -caller to properly handle the error case. - -After seeing similar patterns on well designed APIs, I adopted this practice for -my own code, but was still left with manually verifying the correctness and -robustness of it. - -How could I add assertions around my code that would help me make sure the -`free(s1);` exists, before getting an error report? How do other people and -projects solve this? - -From what I could see, either people a) hope for the best, b) write safe code -but don't strees-test it or c) write ad-hoc code to stress it. - -The most proeminent case of c) is SQLite: it has a few wrappers around the -familiar `malloc` to do fault injection, check for memory limits, add warnings, -create shim layers for other environments, etc. All of that, however, is -tightly couple with SQLite itself, and couldn't be easily pulled off for using -somewhere else. - -When searching for it online, an {so-alloc}[interesting thread] caught my -atention: fail the call to `malloc` for each time it is called, and when the -same stacktrace appears again, allow it to proceed. - -== Implementation - -:mallocfail: https://github.com/ralight/mallocfail -:should-fail-fn: https://euandre.org/git/fallible/tree/src/fallible.c?id=v0.1.0#n16 - -A working implementation of that already exists: {mallocfail}[mallocfail]. It -uses `LD_PRELOAD` to replace `malloc` at run-time, computes the SHA of the -stacktrace and fails once for each SHA. - -I initially envisioned and started implementing something very similar to -mallocfail. However I wanted it to go beyond out-of-memory scenarios, and using -`LD_PRELOAD` for every possible corner that could fail wasn't a good idea on the -long run. - -Also, mallocfail won't work together with tools such as Valgrind, who want to do -their own override of `malloc` with `LD_PRELOAD`. - -I instead went with less automatic things: starting with a -`fallible_should_fail(char *filename, int lineno)` function that fails once for -each `filename`+`lineno` combination, I created macro wrappers around common -functions such as `malloc`: - -[source,c] ----- -void *fallible_malloc(size_t size, const char *const filename, int lineno) { -#ifdef FALLIBLE - if (fallible_should_fail(filename, lineno)) { - return NULL; - } -#else - (void)filename; - (void)lineno; -#endif - return malloc(size); -} - -#define MALLOC(size) fallible_malloc(size, __FILE__, __LINE__) ----- - -With this definition, I could replace the calls to `malloc` with `MALLOC` (or -any other name that you want to `#define`): - -[source,diff] ----- ---- 3.c 2021-02-17 00:15:38.019706074 -0300 -+++ 4.c 2021-02-17 00:44:32.306885590 -0300 -@@ -1,11 +1,11 @@ - bool a_function() { -- char *s1 = malloc(A_NUMBER); -+ char *s1 = MALLOC(A_NUMBER); - if (!s1) { - return false; - } - strcpy(s1, "some string"); - -- char *s2 = malloc(A_NUMBER); -+ char *s2 = MALLOC(A_NUMBER); - if (!s2) { - free(s1); - return false; ----- - -With this change, if the program gets compiled with the `-DFALLIBLE` flag the -fault-injection mechanism will run, and `MALLOC` will fail once for each -`filename`+`lineno` combination. When the flag is missing, `MALLOC` is a very -thin wrapper around `malloc`, which compilers could remove entirely, and the -`-lfallible` flags can be omitted. - -This applies not only to `malloc` or other `stdlib.h` functions. If -`a_function` is important or relevant, I could add a wrapper around it too, that -checks if `fallible_should_fail` to exercise if its callers are also doing the -proper clean-up. - -The actual code is just this single function, -{should-fail-fn}[`fallible_should_fail`], which ended-up taking only ~40 lines. -In fact, there are more lines of either Makefile (111), README.md (82) or troff -(306) on this first version. - -The price for such fine-grained control is that this approach requires more -manual work. - -== Usage examples - -=== `MALLOC` from the `README.md` - -:fallible-check: https://euandreh.xyz/fallible/fallible-check.1.html - -[source,c] ----- -// leaky.c -#include <string.h> -#include <fallible_alloc.h> - -int main() { - char *aaa = MALLOC(100); - if (!aaa) { - return 1; - } - strcpy(aaa, "a safe use of strcpy"); - - char *bbb = MALLOC(100); - if (!bbb) { - // free(aaa); - return 1; - } - strcpy(bbb, "not unsafe, but aaa is leaking"); - - free(bbb); - free(aaa); - return 0; -} ----- - -Compile with `-DFALLIBLE` and run {fallible-check}[`fallible-check.1`]: - -[source,sh] ----- -$ c99 -DFALLIBLE -o leaky leaky.c -lfallible -$ fallible-check ./leaky -Valgrind failed when we did not expect it to: -(...suppressed output...) -# exit status is 1 ----- - -== Conclusion - -:package: https://euandre.org/git/package-repository/ - -For my personal use, I'll {package}[package] them for GNU Guix and Nix. -Packaging it to any other distribution should be trivial, or just downloading -the tarball and running `[sudo] make install`. - -Patches welcome! diff --git a/src/content/blog/2021/02/17/fallible.tar.gz b/src/content/blog/2021/02/17/fallible.tar.gz Binary files differdeleted file mode 100644 index 211cadd..0000000 --- a/src/content/blog/2021/02/17/fallible.tar.gz +++ /dev/null diff --git a/src/content/blog/2021/04/29/relational-review.adoc b/src/content/blog/2021/04/29/relational-review.adoc deleted file mode 100644 index 4b53737..0000000 --- a/src/content/blog/2021/04/29/relational-review.adoc +++ /dev/null @@ -1,144 +0,0 @@ -= A Relational Model of Data for Large Shared Data Banks - article-review - -:empty: -:reviewed-article: https://www.seas.upenn.edu/~zives/03f/cis550/codd.pdf - -This is a review of the article "{reviewed-article}[A Relational Model of Data -for Large Shared Data Banks]", by E. F. Codd. - -== Data Independence - -Codd brings the idea of _data independence_ as a better approach to use on -databases. This is contrast with the existing approaches, namely hierarquical -(tree-based) and network-based. - -His main argument is that queries in applications shouldn't depende and be -coupled with how the data is represented internally by the database system. -This key idea is very powerful, and something that we strive for in many other -places: decoupling the interface from the implementation. - -If the database system has this separation, it can kep the querying interface -stable, while having the freedom to change its internal representation at will, -for better performance, less storage, etc. - -This is true for most modern database systems. They can change from B-Trees -with leafs containing pointers to data, to B-Trees with leafs containing the raw -data , to hash tables. All that without changing the query interface, only its -performance. - -Codd mentions that, from an information representation standpoint, any index is -a duplication, but useful for perfomance. - -This data independence also impacts ordering (a _relation_ doesn't rely on the -insertion order). - -== Duplicates - -His definition of relational data is a bit differente from most modern database -systems, namely *no duplicate rows*. - -I couldn't find a reason behind this restriction, though. For practical -purposes, I find it useful to have it. - -== Relational Data - -:edn: https://github.com/edn-format/edn - -In the article, Codd doesn't try to define a language, and today's most popular -one is SQL. - -However, there is no restriction that says that "SQL database" and "relational -database" are synonyms. One could have a relational database without using SQL -at all, and it would still be a relational one. - -The main one that I have in mind, and the reason that led me to reading this -paper in the first place, is Datomic. - -Is uses an {edn}[edn]-based representation for datalog -queries{empty}footnote:edn-queries[ - You can think of it as JSON, but with a Clojure taste. -], and a particular schema used to represent data. - -Even though it looks very weird when coming from SQL, I'd argue that it ticks -all the boxes (except for "no duplicates") that defines a relational database, -since building relations and applying operations on them is possible. - -Compare and contrast a contrived example of possible representations of SQL and -datalog of the same data: - -[source,sql] ----- --- create schema -CREATE TABLE people ( - id UUID PRIMARY KEY, - name TEXT NOT NULL, - manager_id UUID, - FOREIGN KEY (manager_id) REFERENCES people (id) -); - --- insert data -INSERT INTO people (id, name, manager_id) VALUES - ("d3f29960-ccf0-44e4-be66-1a1544677441", "Foo", "076356f4-1a0e-451c-b9c6-a6f56feec941"), - ("076356f4-1a0e-451c-b9c6-a6f56feec941", "Bar"); - --- query data, make a relation - -SELECT employees.name AS 'employee-name', - managers.name AS 'manager-name' -FROM people employees -INNER JOIN people managers ON employees.manager_id = managers.id; ----- - -[source,clojure] ----- -;; create schema -#{{:db/ident :person/id - :db/valueType :db.type/uuid - :db/cardinality :db.cardinality/one - :db/unique :db.unique/value} - {:db/ident :person/name - :db/valueType :db.type/string - :db/cardinality :db.cardinality/one} - {:db/ident :person/manager - :db/valueType :db.type/ref - :db/cardinality :db.cardinality/one}} - -;; insert data -#{{:person/id #uuid "d3f29960-ccf0-44e4-be66-1a1544677441" - :person/name "Foo" - :person/manager [:person/id #uuid "076356f4-1a0e-451c-b9c6-a6f56feec941"]} - {:person/id #uuid "076356f4-1a0e-451c-b9c6-a6f56feec941" - :person/name "Bar"}} - -;; query data, make a relation -{:find [?employee-name ?manager-name] - :where [[?person :person/name ?employee-name] - [?person :person/manager ?manager] - [?manager :person/name ?manager-name]]} ----- - -(forgive any errors on the above SQL and datalog code, I didn't run them to -check. Patches welcome!) - -This employee example comes from the paper, and both SQL and datalog -representations match the paper definition of "relational". - -Both "Foo" and "Bar" are employees, and the data is normalized. SQL represents -data as tables, and Datomic as datoms, but relations could be derived from both, -which we could view as: - -[source,sql] ----- -employee_name | manager_name ----------------------------- -"Foo" | "Bar" ----- - -== Conclusion - -The article also talks about operators, consistency and normalization, which are -now so widespread and well-known that it feels a bit weird seeing someone -advocating for it. - -I also stablish that `relational != SQL`, and other databases such as Datomic -are also relational, following Codd's original definition. |