summaryrefslogtreecommitdiff
path: root/src/content/en/blog/2021
diff options
context:
space:
mode:
authorEuAndreh <eu@euandre.org>2025-04-18 02:17:12 -0300
committerEuAndreh <eu@euandre.org>2025-04-18 02:48:42 -0300
commit020c1e77489b772f854bb3288b9c8d2818a6bf9d (patch)
tree142aec725a52162a446ea7d947cb4347c9d573c9 /src/content/en/blog/2021
parentMakefile: Remove security.txt.gz (diff)
downloadeuandre.org-020c1e77489b772f854bb3288b9c8d2818a6bf9d.tar.gz
euandre.org-020c1e77489b772f854bb3288b9c8d2818a6bf9d.tar.xz
git mv src/content/* src/content/en/
Diffstat (limited to 'src/content/en/blog/2021')
-rw-r--r--src/content/en/blog/2021/01/26/remembering-ann.adoc216
-rw-r--r--src/content/en/blog/2021/02/17/fallible.adoc285
-rw-r--r--src/content/en/blog/2021/02/17/fallible.tar.gzbin0 -> 1915439 bytes
-rw-r--r--src/content/en/blog/2021/04/29/relational-review.adoc144
4 files changed, 645 insertions, 0 deletions
diff --git a/src/content/en/blog/2021/01/26/remembering-ann.adoc b/src/content/en/blog/2021/01/26/remembering-ann.adoc
new file mode 100644
index 0000000..6786b3c
--- /dev/null
+++ b/src/content/en/blog/2021/01/26/remembering-ann.adoc
@@ -0,0 +1,216 @@
+= ANN: remembering - Add memory to dmenu, fzf and similar tools
+:categories: ann
+
+:remembering: https://euandreh.xyz/remembering/
+:dmenu: https://tools.suckless.org/dmenu/
+:fzf: https://github.com/junegunn/fzf
+
+Today I pushed v0.1.0 of {remembering}[remembering], a tool to enhance the
+interactive usability of menu-like tools, such as {dmenu}[dmenu] and {fzf}[fzf].
+
+== Previous solution
+
+:yeganesh: https://dmwit.com/yeganesh/
+
+I previously used {yeganesh}[yeganesh] to fill this gap, but as I started to
+rely less on Emacs, I added fzf as my go-to tool for doing fuzzy searching on
+the terminal. But I didn't like that fzf always showed the same order of
+things, when I would only need 3 or 4 commonly used files.
+
+For those who don't know: yeganesh is a wrapper around dmenu that will remember
+your most used programs and put them on the beginning of the list of
+executables. This is very convenient for interactive prolonged use, as with
+time the things you usually want are right at the very beginning.
+
+But now I had this thing, yeganesh, that solved this problem for dmenu, but
+didn't for fzf.
+
+I initially considered patching yeganesh to support it, but I found it more
+coupled to dmenu than I would desire. I'd rather have something that knows
+nothing about dmenu, fzf or anything, but enhances tools like those in a useful
+way.
+
+== Implementation
+
+:v-010: https://euandre.org/git/remembering/tree/remembering?id=v0.1.0
+:getopts: https://www.opengroup.org/onlinepubs/9699919799/utilities/getopts.html
+:sort: https://www.opengroup.org/onlinepubs/9699919799/utilities/sort.html
+:awk: https://www.opengroup.org/onlinepubs/9699919799/utilities/awk.html
+:spencer-quote: https://en.wikipedia.org/wiki/Henry_Spencer#cite_note-3
+
+Other than being decoupled from dmenu, another improvement I though that could
+be made on top of yeganesh is the programming language choice. Instead of
+Haskell, I went with POSIX sh. Sticking to POSIX sh makes it require less
+build-time dependencies. There aren't any, actually. Packaging is made much
+easier due to that.
+
+The good thing is that the program itself is small enough ({v-010}[119 lines] on
+v0.1.0) that POSIX sh does the job just fine, combined with other POSIX
+utilities such as {getopts}[getopts], {sort}[sort] and {awk}[awk].
+
+The behaviour is: given a program that will read from STDIN and write a single
+entry to STDOUT, `remembering` wraps that program, and rearranges STDIN so that
+previous choices appear at the beginning.
+
+Where you would do:
+
+[source,sh]
+----
+$ seq 5 | fzf
+
+ 5
+ 4
+ 3
+ 2
+> 1
+ 5/5
+>
+----
+
+And every time get the same order of numbers, now you can write:
+
+[source,sh]
+----
+$ seq 5 | remembering -p seq-fzf -c fzf
+
+ 5
+ 4
+ 3
+ 2
+> 1
+ 5/5
+>
+----
+
+On the first run, everything is the same. If you picked 4 on the previous
+example, the following run would be different:
+
+[source,sh]
+----
+$ seq 5 | remembering -p seq-fzf -c fzf
+
+ 5
+ 3
+ 2
+ 1
+> 4
+ 5/5
+>
+----
+
+As time passes, the list would adjust based on the frequency of your choices.
+
+I aimed for reusability, so that I could wrap diverse commands with
+`remembering` and it would be able to work. To accomplish that, a "profile"
+(the `-p something` part) stores data about different runs separately.
+
+I took the idea of building something small with few dependencies to other
+places too: - the manpages are written in troff directly; - the tests are just
+more POSIX sh files; - and a POSIX Makefile to `check` and `install`.
+
+I was aware of the value of sticking to coding to standards, but I had past
+experience mostly with programming language standards, such as ECMAScript,
+Common Lisp, Scheme, or with IndexedDB or DOM APIs. It felt good to rediscover
+these nice POSIX tools, which makes me remember of a quote by
+{spencer-quote}[Henry Spencer]:
+
+____
+Those who do not understand Unix are condemned to reinvent it, poorly.
+____
+
+== Usage examples
+
+Here are some functions I wrote myself that you may find useful:
+
+=== Run a command with fzf on `$PWD`
+
+[source,sh]
+----
+f() {
+ profile="$f-shell-function(pwd | sed -e 's_/_-_g')"
+ file="$(git ls-files | \
+ remembering -p "$profile" \
+ -c "fzf --select-1 --exit -0 --query \"$2\" --preview 'cat {}'")"
+ if [ -n "$file" ]; then
+ # shellcheck disable=2068
+ history -s f $@
+ history -s "$1" "$file"
+ "$1" "$file"
+fi
+}
+----
+
+This way I can run `f vi` or `f vi config` at the root of a repository, and the
+list of files will always appear on the most used order. Adding `pwd` to the
+profile allows it to not mix data for different repositories.
+
+=== Copy password to clipboard
+
+:pass: https://www.passwordstore.org/
+
+[source,sh]
+----
+choice="$(find "$HOME/.password-store" -type f | \
+ grep -Ev '(.git|.gpg-id)' | \
+ sed -e "s|$HOME/.password-store/||" -e 's/\.gpg$//' | \
+ remembering -p password-store \
+ -c 'dmenu -l 20 -i')"
+
+
+if [ -n "$choice" ]; then
+ pass show "$choice" -c
+fi
+----
+
+Adding the above to a file and binding it to a keyboard shortcut, I can access
+the contents of my {pass}[password store], with the entries ordered by usage.
+
+=== Replacing yeganesh
+
+Where I previously had:
+
+[source,sh]
+----
+exe=$(yeganesh -x) && exec $exe
+----
+
+Now I have:
+
+[source,sh]
+----
+exe=$(dmenu_path | remembering -p dmenu-exec -c dmenu) && exec $exe
+----
+
+This way, the executables appear on order of usage.
+
+If you don't have `dmenu_path`, you can get just the underlying `stest` tool
+that looks at the executables available in your `$PATH`. Here's a juicy
+one-liner to do it:
+
+[source,sh]
+----
+$ wget -O- https://dl.suckless.org/tools/dmenu-5.0.tar.gz | \
+ tar Ozxf - dmenu-5.0/arg.h dmenu-5.0/stest.c | \
+ sed 's|^#include "arg.h"$|// #include "arg.h"|' | \
+ cc -xc - -o stest
+----
+
+With the `stest` utility you'll be able to list executables in your `$PATH` and
+pipe them to dmenu or something else yourself:
+
+[source,sh]
+----
+$ (IFS=:; ./stest -flx $PATH;) | sort -u | remembering -p another-dmenu-exec -c dmenu | sh
+----
+
+In fact, the code for `dmenu_path` is almost just like that.
+
+== Conclusion
+
+:packaged: https://euandre.org/git/package-repository/
+
+For my personal use, I've {packaged}[packaged] `remembering` for GNU Guix and
+Nix. Packaging it to any other distribution should be trivial, or just
+downloading the tarball and running `[sudo] make install`.
+
+Patches welcome!
diff --git a/src/content/en/blog/2021/02/17/fallible.adoc b/src/content/en/blog/2021/02/17/fallible.adoc
new file mode 100644
index 0000000..1f2f641
--- /dev/null
+++ b/src/content/en/blog/2021/02/17/fallible.adoc
@@ -0,0 +1,285 @@
+= ANN: fallible - Fault injection library for stress-testing failure scenarios
+:updatedat: 2022-03-06
+
+:fallible: https://euandreh.xyz/fallible/
+
+Yesterday I pushed v0.1.0 of {fallible}[fallible], a miniscule library for
+fault-injection and stress-testing C programs.
+
+== _EDIT_
+
+:changelog: https://euandreh.xyz/fallible/CHANGELOG.html
+:tarball: https://euandre.org/static/attachments/fallible.tar.gz
+
+2021-06-12: As of {changelog}[0.3.0] (and beyond), the macro interface improved
+and is a bit different from what is presented in this article. If you're
+interested, I encourage you to take a look at it.
+
+2022-03-06: I've {tarball}[archived] the project for now. It still needs some
+maturing before being usable.
+
+== Existing solutions
+
+:gnu-std: https://www.gnu.org/prep/standards/standards.html#Semantics
+:valgrind: https://www.valgrind.org/
+:so-alloc: https://stackoverflow.com/questions/1711170/unit-testing-for-failed-malloc
+
+Writing robust code can be challenging, and tools like static analyzers, fuzzers
+and friends can help you get there with more certainty. As I would try to
+improve some of my C code and make it more robust, in order to handle system
+crashes, filled disks, out-of-memory and similar scenarios, I didn't find
+existing tooling to help me get there as I expected to find. I couldn't find
+existing tools to help me explicitly stress-test those failure scenarios.
+
+Take the "{gnu-std}[Writing Robust Programs]" section of the GNU Coding
+Standards:
+
+____
+Check every system call for an error return, unless you know you wish to ignore
+errors. (...) Check every call to malloc or realloc to see if it returned NULL.
+____
+
+From a robustness standpoint, this is a reasonable stance: if you want to have a
+robust program that knows how to fail when you're out of memory and `malloc`
+returns `NULL`, than you ought to check every call to `malloc`.
+
+Take a sample code snippet for clarity:
+
+[source,c]
+----
+void a_function() {
+ char *s1 = malloc(A_NUMBER);
+ strcpy(s1, "some string");
+
+ char *s2 = malloc(A_NUMBER);
+ strcpy(s2, "another string");
+}
+----
+
+At a first glance, this code is unsafe: if any of the calls to `malloc` returns
+`NULL`, `strcpy` will be given a `NULL` pointer.
+
+My first instinct was to change this code to something like this:
+
+[source,diff]
+----
+@@ -1,7 +1,15 @@
+ void a_function() {
+ char *s1 = malloc(A_NUMBER);
++ if (!s1) {
++ fprintf(stderr, "out of memory, exitting\n");
++ exit(1);
++ }
+ strcpy(s1, "some string");
+
+ char *s2 = malloc(A_NUMBER);
++ if (!s2) {
++ fprintf(stderr, "out of memory, exitting\n");
++ exit(1);
++ }
+ strcpy(s2, "another string");
+ }
+----
+
+As I later found out, there are at least 2 problems with this approach:
+
+. *it doesn't compose*: this could arguably work if `a_function` was `main`.
+ But if `a_function` lives inside a library, an `exit(1);` is an inelegant way
+ of handling failures, and will catch the top-level `main` consuming the
+ library by surprise;
+. *it gives up instead of handling failures*: the actual handling goes a bit
+ beyond stopping. What about open file handles, in-memory caches, unflushed
+ bytes, etc.?
+
+If you could force only the second call to `malloc` to fail,
+{valgrind}[Valgrind] would correctly complain that the program exitted with
+unfreed memory.
+
+So the last change to make the best version of the above code is:
+
+[source,diff]
+----
+@@ -1,15 +1,14 @@
+-void a_function() {
++bool a_function() {
+ char *s1 = malloc(A_NUMBER);
+ if (!s1) {
+- fprintf(stderr, "out of memory, exitting\n");
+- exit(1);
++ return false;
+ }
+ strcpy(s1, "some string");
+
+ char *s2 = malloc(A_NUMBER);
+ if (!s2) {
+- fprintf(stderr, "out of memory, exitting\n");
+- exit(1);
++ free(s1);
++ return false;
+ }
+ strcpy(s2, "another string");
+ }
+----
+
+Instead of returning `void`, `a_function` now returns `bool` to indicate whether
+an error ocurred during its execution. If `a_function` returned a pointer to
+something, the return value could be `NULL`, or an `int` that represents an
+error code.
+
+The code is now a) safe and b) failing gracefully, returning the control to the
+caller to properly handle the error case.
+
+After seeing similar patterns on well designed APIs, I adopted this practice for
+my own code, but was still left with manually verifying the correctness and
+robustness of it.
+
+How could I add assertions around my code that would help me make sure the
+`free(s1);` exists, before getting an error report? How do other people and
+projects solve this?
+
+From what I could see, either people a) hope for the best, b) write safe code
+but don't strees-test it or c) write ad-hoc code to stress it.
+
+The most proeminent case of c) is SQLite: it has a few wrappers around the
+familiar `malloc` to do fault injection, check for memory limits, add warnings,
+create shim layers for other environments, etc. All of that, however, is
+tightly couple with SQLite itself, and couldn't be easily pulled off for using
+somewhere else.
+
+When searching for it online, an {so-alloc}[interesting thread] caught my
+atention: fail the call to `malloc` for each time it is called, and when the
+same stacktrace appears again, allow it to proceed.
+
+== Implementation
+
+:mallocfail: https://github.com/ralight/mallocfail
+:should-fail-fn: https://euandre.org/git/fallible/tree/src/fallible.c?id=v0.1.0#n16
+
+A working implementation of that already exists: {mallocfail}[mallocfail]. It
+uses `LD_PRELOAD` to replace `malloc` at run-time, computes the SHA of the
+stacktrace and fails once for each SHA.
+
+I initially envisioned and started implementing something very similar to
+mallocfail. However I wanted it to go beyond out-of-memory scenarios, and using
+`LD_PRELOAD` for every possible corner that could fail wasn't a good idea on the
+long run.
+
+Also, mallocfail won't work together with tools such as Valgrind, who want to do
+their own override of `malloc` with `LD_PRELOAD`.
+
+I instead went with less automatic things: starting with a
+`fallible_should_fail(char *filename, int lineno)` function that fails once for
+each `filename`+`lineno` combination, I created macro wrappers around common
+functions such as `malloc`:
+
+[source,c]
+----
+void *fallible_malloc(size_t size, const char *const filename, int lineno) {
+#ifdef FALLIBLE
+ if (fallible_should_fail(filename, lineno)) {
+ return NULL;
+ }
+#else
+ (void)filename;
+ (void)lineno;
+#endif
+ return malloc(size);
+}
+
+#define MALLOC(size) fallible_malloc(size, __FILE__, __LINE__)
+----
+
+With this definition, I could replace the calls to `malloc` with `MALLOC` (or
+any other name that you want to `#define`):
+
+[source,diff]
+----
+--- 3.c 2021-02-17 00:15:38.019706074 -0300
++++ 4.c 2021-02-17 00:44:32.306885590 -0300
+@@ -1,11 +1,11 @@
+ bool a_function() {
+- char *s1 = malloc(A_NUMBER);
++ char *s1 = MALLOC(A_NUMBER);
+ if (!s1) {
+ return false;
+ }
+ strcpy(s1, "some string");
+
+- char *s2 = malloc(A_NUMBER);
++ char *s2 = MALLOC(A_NUMBER);
+ if (!s2) {
+ free(s1);
+ return false;
+----
+
+With this change, if the program gets compiled with the `-DFALLIBLE` flag the
+fault-injection mechanism will run, and `MALLOC` will fail once for each
+`filename`+`lineno` combination. When the flag is missing, `MALLOC` is a very
+thin wrapper around `malloc`, which compilers could remove entirely, and the
+`-lfallible` flags can be omitted.
+
+This applies not only to `malloc` or other `stdlib.h` functions. If
+`a_function` is important or relevant, I could add a wrapper around it too, that
+checks if `fallible_should_fail` to exercise if its callers are also doing the
+proper clean-up.
+
+The actual code is just this single function,
+{should-fail-fn}[`fallible_should_fail`], which ended-up taking only ~40 lines.
+In fact, there are more lines of either Makefile (111), README.md (82) or troff
+(306) on this first version.
+
+The price for such fine-grained control is that this approach requires more
+manual work.
+
+== Usage examples
+
+=== `MALLOC` from the `README.md`
+
+:fallible-check: https://euandreh.xyz/fallible/fallible-check.1.html
+
+[source,c]
+----
+// leaky.c
+#include <string.h>
+#include <fallible_alloc.h>
+
+int main() {
+ char *aaa = MALLOC(100);
+ if (!aaa) {
+ return 1;
+ }
+ strcpy(aaa, "a safe use of strcpy");
+
+ char *bbb = MALLOC(100);
+ if (!bbb) {
+ // free(aaa);
+ return 1;
+ }
+ strcpy(bbb, "not unsafe, but aaa is leaking");
+
+ free(bbb);
+ free(aaa);
+ return 0;
+}
+----
+
+Compile with `-DFALLIBLE` and run {fallible-check}[`fallible-check.1`]:
+
+[source,sh]
+----
+$ c99 -DFALLIBLE -o leaky leaky.c -lfallible
+$ fallible-check ./leaky
+Valgrind failed when we did not expect it to:
+(...suppressed output...)
+# exit status is 1
+----
+
+== Conclusion
+
+:package: https://euandre.org/git/package-repository/
+
+For my personal use, I'll {package}[package] them for GNU Guix and Nix.
+Packaging it to any other distribution should be trivial, or just downloading
+the tarball and running `[sudo] make install`.
+
+Patches welcome!
diff --git a/src/content/en/blog/2021/02/17/fallible.tar.gz b/src/content/en/blog/2021/02/17/fallible.tar.gz
new file mode 100644
index 0000000..211cadd
--- /dev/null
+++ b/src/content/en/blog/2021/02/17/fallible.tar.gz
Binary files differ
diff --git a/src/content/en/blog/2021/04/29/relational-review.adoc b/src/content/en/blog/2021/04/29/relational-review.adoc
new file mode 100644
index 0000000..4b53737
--- /dev/null
+++ b/src/content/en/blog/2021/04/29/relational-review.adoc
@@ -0,0 +1,144 @@
+= A Relational Model of Data for Large Shared Data Banks - article-review
+
+:empty:
+:reviewed-article: https://www.seas.upenn.edu/~zives/03f/cis550/codd.pdf
+
+This is a review of the article "{reviewed-article}[A Relational Model of Data
+for Large Shared Data Banks]", by E. F. Codd.
+
+== Data Independence
+
+Codd brings the idea of _data independence_ as a better approach to use on
+databases. This is contrast with the existing approaches, namely hierarquical
+(tree-based) and network-based.
+
+His main argument is that queries in applications shouldn't depende and be
+coupled with how the data is represented internally by the database system.
+This key idea is very powerful, and something that we strive for in many other
+places: decoupling the interface from the implementation.
+
+If the database system has this separation, it can kep the querying interface
+stable, while having the freedom to change its internal representation at will,
+for better performance, less storage, etc.
+
+This is true for most modern database systems. They can change from B-Trees
+with leafs containing pointers to data, to B-Trees with leafs containing the raw
+data , to hash tables. All that without changing the query interface, only its
+performance.
+
+Codd mentions that, from an information representation standpoint, any index is
+a duplication, but useful for perfomance.
+
+This data independence also impacts ordering (a _relation_ doesn't rely on the
+insertion order).
+
+== Duplicates
+
+His definition of relational data is a bit differente from most modern database
+systems, namely *no duplicate rows*.
+
+I couldn't find a reason behind this restriction, though. For practical
+purposes, I find it useful to have it.
+
+== Relational Data
+
+:edn: https://github.com/edn-format/edn
+
+In the article, Codd doesn't try to define a language, and today's most popular
+one is SQL.
+
+However, there is no restriction that says that "SQL database" and "relational
+database" are synonyms. One could have a relational database without using SQL
+at all, and it would still be a relational one.
+
+The main one that I have in mind, and the reason that led me to reading this
+paper in the first place, is Datomic.
+
+Is uses an {edn}[edn]-based representation for datalog
+queries{empty}footnote:edn-queries[
+ You can think of it as JSON, but with a Clojure taste.
+], and a particular schema used to represent data.
+
+Even though it looks very weird when coming from SQL, I'd argue that it ticks
+all the boxes (except for "no duplicates") that defines a relational database,
+since building relations and applying operations on them is possible.
+
+Compare and contrast a contrived example of possible representations of SQL and
+datalog of the same data:
+
+[source,sql]
+----
+-- create schema
+CREATE TABLE people (
+ id UUID PRIMARY KEY,
+ name TEXT NOT NULL,
+ manager_id UUID,
+ FOREIGN KEY (manager_id) REFERENCES people (id)
+);
+
+-- insert data
+INSERT INTO people (id, name, manager_id) VALUES
+ ("d3f29960-ccf0-44e4-be66-1a1544677441", "Foo", "076356f4-1a0e-451c-b9c6-a6f56feec941"),
+ ("076356f4-1a0e-451c-b9c6-a6f56feec941", "Bar");
+
+-- query data, make a relation
+
+SELECT employees.name AS 'employee-name',
+ managers.name AS 'manager-name'
+FROM people employees
+INNER JOIN people managers ON employees.manager_id = managers.id;
+----
+
+[source,clojure]
+----
+;; create schema
+#{{:db/ident :person/id
+ :db/valueType :db.type/uuid
+ :db/cardinality :db.cardinality/one
+ :db/unique :db.unique/value}
+ {:db/ident :person/name
+ :db/valueType :db.type/string
+ :db/cardinality :db.cardinality/one}
+ {:db/ident :person/manager
+ :db/valueType :db.type/ref
+ :db/cardinality :db.cardinality/one}}
+
+;; insert data
+#{{:person/id #uuid "d3f29960-ccf0-44e4-be66-1a1544677441"
+ :person/name "Foo"
+ :person/manager [:person/id #uuid "076356f4-1a0e-451c-b9c6-a6f56feec941"]}
+ {:person/id #uuid "076356f4-1a0e-451c-b9c6-a6f56feec941"
+ :person/name "Bar"}}
+
+;; query data, make a relation
+{:find [?employee-name ?manager-name]
+ :where [[?person :person/name ?employee-name]
+ [?person :person/manager ?manager]
+ [?manager :person/name ?manager-name]]}
+----
+
+(forgive any errors on the above SQL and datalog code, I didn't run them to
+check. Patches welcome!)
+
+This employee example comes from the paper, and both SQL and datalog
+representations match the paper definition of "relational".
+
+Both "Foo" and "Bar" are employees, and the data is normalized. SQL represents
+data as tables, and Datomic as datoms, but relations could be derived from both,
+which we could view as:
+
+[source,sql]
+----
+employee_name | manager_name
+----------------------------
+"Foo" | "Bar"
+----
+
+== Conclusion
+
+The article also talks about operators, consistency and normalization, which are
+now so widespread and well-known that it feels a bit weird seeing someone
+advocating for it.
+
+I also stablish that `relational != SQL`, and other databases such as Datomic
+are also relational, following Codd's original definition.