diff options
author | EuAndreh <eu@euandre.org> | 2025-04-18 02:17:12 -0300 |
---|---|---|
committer | EuAndreh <eu@euandre.org> | 2025-04-18 02:48:42 -0300 |
commit | 020c1e77489b772f854bb3288b9c8d2818a6bf9d (patch) | |
tree | 142aec725a52162a446ea7d947cb4347c9d573c9 /src/content/en/blog/2021/02/17/fallible.adoc | |
parent | Makefile: Remove security.txt.gz (diff) | |
download | euandre.org-020c1e77489b772f854bb3288b9c8d2818a6bf9d.tar.gz euandre.org-020c1e77489b772f854bb3288b9c8d2818a6bf9d.tar.xz |
git mv src/content/* src/content/en/
Diffstat (limited to 'src/content/en/blog/2021/02/17/fallible.adoc')
-rw-r--r-- | src/content/en/blog/2021/02/17/fallible.adoc | 285 |
1 files changed, 285 insertions, 0 deletions
diff --git a/src/content/en/blog/2021/02/17/fallible.adoc b/src/content/en/blog/2021/02/17/fallible.adoc new file mode 100644 index 0000000..1f2f641 --- /dev/null +++ b/src/content/en/blog/2021/02/17/fallible.adoc @@ -0,0 +1,285 @@ += ANN: fallible - Fault injection library for stress-testing failure scenarios +:updatedat: 2022-03-06 + +:fallible: https://euandreh.xyz/fallible/ + +Yesterday I pushed v0.1.0 of {fallible}[fallible], a miniscule library for +fault-injection and stress-testing C programs. + +== _EDIT_ + +:changelog: https://euandreh.xyz/fallible/CHANGELOG.html +:tarball: https://euandre.org/static/attachments/fallible.tar.gz + +2021-06-12: As of {changelog}[0.3.0] (and beyond), the macro interface improved +and is a bit different from what is presented in this article. If you're +interested, I encourage you to take a look at it. + +2022-03-06: I've {tarball}[archived] the project for now. It still needs some +maturing before being usable. + +== Existing solutions + +:gnu-std: https://www.gnu.org/prep/standards/standards.html#Semantics +:valgrind: https://www.valgrind.org/ +:so-alloc: https://stackoverflow.com/questions/1711170/unit-testing-for-failed-malloc + +Writing robust code can be challenging, and tools like static analyzers, fuzzers +and friends can help you get there with more certainty. As I would try to +improve some of my C code and make it more robust, in order to handle system +crashes, filled disks, out-of-memory and similar scenarios, I didn't find +existing tooling to help me get there as I expected to find. I couldn't find +existing tools to help me explicitly stress-test those failure scenarios. + +Take the "{gnu-std}[Writing Robust Programs]" section of the GNU Coding +Standards: + +____ +Check every system call for an error return, unless you know you wish to ignore +errors. (...) Check every call to malloc or realloc to see if it returned NULL. +____ + +From a robustness standpoint, this is a reasonable stance: if you want to have a +robust program that knows how to fail when you're out of memory and `malloc` +returns `NULL`, than you ought to check every call to `malloc`. + +Take a sample code snippet for clarity: + +[source,c] +---- +void a_function() { + char *s1 = malloc(A_NUMBER); + strcpy(s1, "some string"); + + char *s2 = malloc(A_NUMBER); + strcpy(s2, "another string"); +} +---- + +At a first glance, this code is unsafe: if any of the calls to `malloc` returns +`NULL`, `strcpy` will be given a `NULL` pointer. + +My first instinct was to change this code to something like this: + +[source,diff] +---- +@@ -1,7 +1,15 @@ + void a_function() { + char *s1 = malloc(A_NUMBER); ++ if (!s1) { ++ fprintf(stderr, "out of memory, exitting\n"); ++ exit(1); ++ } + strcpy(s1, "some string"); + + char *s2 = malloc(A_NUMBER); ++ if (!s2) { ++ fprintf(stderr, "out of memory, exitting\n"); ++ exit(1); ++ } + strcpy(s2, "another string"); + } +---- + +As I later found out, there are at least 2 problems with this approach: + +. *it doesn't compose*: this could arguably work if `a_function` was `main`. + But if `a_function` lives inside a library, an `exit(1);` is an inelegant way + of handling failures, and will catch the top-level `main` consuming the + library by surprise; +. *it gives up instead of handling failures*: the actual handling goes a bit + beyond stopping. What about open file handles, in-memory caches, unflushed + bytes, etc.? + +If you could force only the second call to `malloc` to fail, +{valgrind}[Valgrind] would correctly complain that the program exitted with +unfreed memory. + +So the last change to make the best version of the above code is: + +[source,diff] +---- +@@ -1,15 +1,14 @@ +-void a_function() { ++bool a_function() { + char *s1 = malloc(A_NUMBER); + if (!s1) { +- fprintf(stderr, "out of memory, exitting\n"); +- exit(1); ++ return false; + } + strcpy(s1, "some string"); + + char *s2 = malloc(A_NUMBER); + if (!s2) { +- fprintf(stderr, "out of memory, exitting\n"); +- exit(1); ++ free(s1); ++ return false; + } + strcpy(s2, "another string"); + } +---- + +Instead of returning `void`, `a_function` now returns `bool` to indicate whether +an error ocurred during its execution. If `a_function` returned a pointer to +something, the return value could be `NULL`, or an `int` that represents an +error code. + +The code is now a) safe and b) failing gracefully, returning the control to the +caller to properly handle the error case. + +After seeing similar patterns on well designed APIs, I adopted this practice for +my own code, but was still left with manually verifying the correctness and +robustness of it. + +How could I add assertions around my code that would help me make sure the +`free(s1);` exists, before getting an error report? How do other people and +projects solve this? + +From what I could see, either people a) hope for the best, b) write safe code +but don't strees-test it or c) write ad-hoc code to stress it. + +The most proeminent case of c) is SQLite: it has a few wrappers around the +familiar `malloc` to do fault injection, check for memory limits, add warnings, +create shim layers for other environments, etc. All of that, however, is +tightly couple with SQLite itself, and couldn't be easily pulled off for using +somewhere else. + +When searching for it online, an {so-alloc}[interesting thread] caught my +atention: fail the call to `malloc` for each time it is called, and when the +same stacktrace appears again, allow it to proceed. + +== Implementation + +:mallocfail: https://github.com/ralight/mallocfail +:should-fail-fn: https://euandre.org/git/fallible/tree/src/fallible.c?id=v0.1.0#n16 + +A working implementation of that already exists: {mallocfail}[mallocfail]. It +uses `LD_PRELOAD` to replace `malloc` at run-time, computes the SHA of the +stacktrace and fails once for each SHA. + +I initially envisioned and started implementing something very similar to +mallocfail. However I wanted it to go beyond out-of-memory scenarios, and using +`LD_PRELOAD` for every possible corner that could fail wasn't a good idea on the +long run. + +Also, mallocfail won't work together with tools such as Valgrind, who want to do +their own override of `malloc` with `LD_PRELOAD`. + +I instead went with less automatic things: starting with a +`fallible_should_fail(char *filename, int lineno)` function that fails once for +each `filename`+`lineno` combination, I created macro wrappers around common +functions such as `malloc`: + +[source,c] +---- +void *fallible_malloc(size_t size, const char *const filename, int lineno) { +#ifdef FALLIBLE + if (fallible_should_fail(filename, lineno)) { + return NULL; + } +#else + (void)filename; + (void)lineno; +#endif + return malloc(size); +} + +#define MALLOC(size) fallible_malloc(size, __FILE__, __LINE__) +---- + +With this definition, I could replace the calls to `malloc` with `MALLOC` (or +any other name that you want to `#define`): + +[source,diff] +---- +--- 3.c 2021-02-17 00:15:38.019706074 -0300 ++++ 4.c 2021-02-17 00:44:32.306885590 -0300 +@@ -1,11 +1,11 @@ + bool a_function() { +- char *s1 = malloc(A_NUMBER); ++ char *s1 = MALLOC(A_NUMBER); + if (!s1) { + return false; + } + strcpy(s1, "some string"); + +- char *s2 = malloc(A_NUMBER); ++ char *s2 = MALLOC(A_NUMBER); + if (!s2) { + free(s1); + return false; +---- + +With this change, if the program gets compiled with the `-DFALLIBLE` flag the +fault-injection mechanism will run, and `MALLOC` will fail once for each +`filename`+`lineno` combination. When the flag is missing, `MALLOC` is a very +thin wrapper around `malloc`, which compilers could remove entirely, and the +`-lfallible` flags can be omitted. + +This applies not only to `malloc` or other `stdlib.h` functions. If +`a_function` is important or relevant, I could add a wrapper around it too, that +checks if `fallible_should_fail` to exercise if its callers are also doing the +proper clean-up. + +The actual code is just this single function, +{should-fail-fn}[`fallible_should_fail`], which ended-up taking only ~40 lines. +In fact, there are more lines of either Makefile (111), README.md (82) or troff +(306) on this first version. + +The price for such fine-grained control is that this approach requires more +manual work. + +== Usage examples + +=== `MALLOC` from the `README.md` + +:fallible-check: https://euandreh.xyz/fallible/fallible-check.1.html + +[source,c] +---- +// leaky.c +#include <string.h> +#include <fallible_alloc.h> + +int main() { + char *aaa = MALLOC(100); + if (!aaa) { + return 1; + } + strcpy(aaa, "a safe use of strcpy"); + + char *bbb = MALLOC(100); + if (!bbb) { + // free(aaa); + return 1; + } + strcpy(bbb, "not unsafe, but aaa is leaking"); + + free(bbb); + free(aaa); + return 0; +} +---- + +Compile with `-DFALLIBLE` and run {fallible-check}[`fallible-check.1`]: + +[source,sh] +---- +$ c99 -DFALLIBLE -o leaky leaky.c -lfallible +$ fallible-check ./leaky +Valgrind failed when we did not expect it to: +(...suppressed output...) +# exit status is 1 +---- + +== Conclusion + +:package: https://euandre.org/git/package-repository/ + +For my personal use, I'll {package}[package] them for GNU Guix and Nix. +Packaging it to any other distribution should be trivial, or just downloading +the tarball and running `[sudo] make install`. + +Patches welcome! |