#
msgid ""
msgstr ""
msgid ""
"title: \"ANN: fallible - Fault injection library for stress-testing failure "
"scenarios\""
msgstr ""
msgid "date: 2021-02-17"
msgstr ""
msgid "layout: post"
msgstr ""
msgid "lang: en"
msgstr ""
msgid ""
"ref: ann-fallible-fault-injection-library-for-stress-testing-failure-"
"scenarios"
msgstr ""
msgid "Existing solutions"
msgstr ""
msgid ""
"Writing robust code can be challenging, and tools like static analyzers, "
"fuzzers and friends can help you get there with more certainty. As I would "
"try to improve some of my C code and make it more robust, in order to handle"
" system crashes, filled disks, out-of-memory and similar scenarios, I didn't"
" find existing tooling to help me get there as I expected to find. I "
"couldn't find existing tools to help me explicitly stress-test those failure"
" scenarios."
msgstr ""
msgid ""
"Take the \"[Writing Robust "
"Programs](https://www.gnu.org/prep/standards/standards.html#Semantics)\" "
"section of the GNU Coding Standards:"
msgstr ""
msgid ""
"Check every system call for an error return, unless you know you wish to "
"ignore errors. (...) Check every call to malloc or realloc to see if it "
"returned NULL."
msgstr ""
msgid ""
"From a robustness standpoint, this is a reasonable stance: if you want to "
"have a robust program that knows how to fail when you're out of memory and "
"`malloc` returns `NULL`, than you ought to check every call to `malloc`."
msgstr ""
msgid "Take a sample code snippet for clarity:"
msgstr ""
msgid ""
"void a_function() {\n"
" char *s1 = malloc(A_NUMBER);\n"
" strcpy(s1, \"some string\");\n"
"\n"
" char *s2 = malloc(A_NUMBER);\n"
" strcpy(s2, \"another string\");\n"
"}\n"
msgstr ""
msgid ""
"At a first glance, this code is unsafe: if any of the calls to `malloc` "
"returns `NULL`, `strcpy` will be given a `NULL` pointer."
msgstr ""
msgid "My first instinct was to change this code to something like this:"
msgstr ""
msgid ""
"@@ -1,7 +1,15 @@\n"
" void a_function() {\n"
" char *s1 = malloc(A_NUMBER);\n"
"+ if (!s1) {\n"
"+ fprintf(stderr, \"out of memory, exitting\\n\");\n"
"+ exit(1);\n"
"+ }\n"
" strcpy(s1, \"some string\");\n"
"\n"
" char *s2 = malloc(A_NUMBER);\n"
"+ if (!s2) {\n"
"+ fprintf(stderr, \"out of memory, exitting\\n\");\n"
"+ exit(1);\n"
"+ }\n"
" strcpy(s2, \"another string\");\n"
" }\n"
msgstr ""
msgid ""
"As I later found out, there are at least 2 problems with this approach:"
msgstr ""
msgid ""
"**it doesn't compose**: this could arguably work if `a_function` was `main`."
" But if `a_function` lives inside a library, an `exit(1);` is a inelegant "
"way of handling failures, and will catch the top-level `main` consuming the "
"library by surprise;"
msgstr ""
msgid ""
"**it gives up instead of handling failures**: the actual handling goes a bit"
" beyond stopping. What about open file handles, in-memory caches, unflushed "
"bytes, etc.?"
msgstr ""
msgid ""
"If you could force only the second call to `malloc` to fail, "
"[Valgrind](https://www.valgrind.org/) would correctly complain that the "
"program exitted with unfreed memory."
msgstr ""
msgid "So the last change to make the best version of the above code is:"
msgstr ""
msgid ""
"@@ -1,15 +1,14 @@\n"
"-void a_function() {\n"
"+bool a_function() {\n"
" char *s1 = malloc(A_NUMBER);\n"
" if (!s1) {\n"
"- fprintf(stderr, \"out of memory, exitting\\n\");\n"
"- exit(1);\n"
"+ return false;\n"
" }\n"
" strcpy(s1, \"some string\");\n"
"\n"
" char *s2 = malloc(A_NUMBER);\n"
" if (!s2) {\n"
"- fprintf(stderr, \"out of memory, exitting\\n\");\n"
"- exit(1);\n"
"+ free(s1);\n"
"+ return false;\n"
" }\n"
" strcpy(s2, \"another string\");\n"
" }\n"
msgstr ""
msgid ""
"Instead of returning `void`, `a_function` now returns `bool` to indicate "
"whether an error ocurred during its execution. If `a_function` returned a "
"pointer to something, the return value could be `NULL`, or an `int` that "
"represents an error code."
msgstr ""
msgid ""
"The code is now a) safe and b) failing gracefully, returning the control to "
"the caller to properly handle the error case."
msgstr ""
msgid ""
"After seeing similar patterns on well designed APIs, I adopted this practice"
" for my own code, but was still left with manually verifying the correctness"
" and robustness of it."
msgstr ""
msgid ""
"How could I add assertions around my code that would help me make sure the "
"`free(s1);` exists, before getting an error report? How do other people and "
"projects solve this?"
msgstr ""
msgid ""
"From what I could see, either people a) hope for the best, b) write safe "
"code but don't strees-test it or c) write ad-hoc code to stress it."
msgstr ""
msgid ""
"The most proeminent case of c) is SQLite: it has a few wrappers around the "
"familiar `malloc` to do fault injection, check for memory limits, add "
"warnings, create shim layers for other environments, etc. All of that, "
"however, is tightly couple with SQLite itself, and couldn't be easily pulled"
" off for using somewhere else."
msgstr ""
msgid ""
"When searching for it online, an [interesting "
"thread](https://stackoverflow.com/questions/1711170/unit-testing-for-failed-"
"malloc) caught my atention: fail the call to `malloc` for each time it is "
"called, and when the same stacktrace appears again, allow it to proceed."
msgstr ""
msgid "Implementation"
msgstr ""
msgid ""
"A working implementation of that already exists: "
"[mallocfail](https://github.com/ralight/mallocfail). It uses `LD_PRELOAD` to"
" replace `malloc` at run-time, computes the SHA of the stacktrace and fails "
"once for each SHA."
msgstr ""
msgid ""
"I initially envisioned and started implementing something very similar to "
"mallocfail. However I wanted it to go beyond out-of-memory scenarios, and "
"using `LD_PRELOAD` for every possible corner that could fail wasn't a good "
"idea on the long run."
msgstr ""
msgid ""
"Also, mallocfail won't work together with tools such as Valgrind, who want "
"to do their own override of `malloc` with `LD_PRELOAD`."
msgstr ""
msgid ""
"I instead went with less automatic things: starting with a "
"`fallible_should_fail(char *filename, int lineno)` function that fails once "
"for each `filename`+`lineno` combination, I created macro wrappers around "
"common functions such as `malloc`:"
msgstr ""
msgid ""
"void *fallible_malloc(size_t size, const char *const filename, int lineno) {\n"
"#ifdef FALLIBLE\n"
" if (fallible_should_fail(filename, lineno)) {\n"
" return NULL;\n"
" }\n"
"#else\n"
" (void)filename;\n"
" (void)lineno;\n"
"#endif\n"
" return malloc(size);\n"
"}\n"
"\n"
"#define MALLOC(size) fallible_malloc(size, __FILE__, __LINE__)\n"
msgstr ""
msgid ""
"With this definition, I could replace the calls to `malloc` with `MALLOC` "
"(or any other name that you want to `#define`):"
msgstr ""
msgid ""
"With this change, if the program gets compiled with the `-DFALLIBLE` flag "
"the fault-injection mechanism will run, and `MALLOC` will fail once for each"
" `filename`+`lineno` combination. When the flag is missing, `MALLOC` is a "
"very thin wrapper around `malloc`, which compilers could remove entirely, "
"and the `-lfallible` flags can be omitted."
msgstr ""
msgid ""
"This applies not only to `malloc` or other `stdlib.h` functions. If "
"`a_function` is important or relevant, I could add a wrapper around it too, "
"that checks if `fallible_should_fail` to exercise if its callers are also "
"doing the proper clean-up."
msgstr ""
msgid ""
"The actual code is just this single function, "
"[`fallible_should_fail`](https://git.euandreh.xyz/fallible/tree/src/fallible.c?id=v0.1.0#n16),"
" which ended-up taking only ~40 lines. In fact, there are more lines of "
"either Makefile (111), README.md (82) or troff (306) on this first version."
msgstr ""
msgid ""
"The price for such fine-grained control is that this approach requires more "
"manual work."
msgstr ""
msgid "Usage examples"
msgstr ""
msgid "`MALLOC` from the `README.md`"
msgstr ""
msgid ""
"// leaky.c\n"
"#include <string.h>\n"
"#include <fallible_alloc.h>\n"
"\n"
"int main() {\n"
" char *aaa = MALLOC(100);\n"
" if (!aaa) {\n"
" return 1;\n"
" }\n"
" strcpy(aaa, \"a safe use of strcpy\");\n"
"\n"
" char *bbb = MALLOC(100);\n"
" if (!bbb) {\n"
" // free(aaa);\n"
" return 1;\n"
" }\n"
" strcpy(bbb, \"not unsafe, but aaa is leaking\");\n"
"\n"
" free(bbb);\n"
" free(aaa);\n"
" return 0;\n"
"}\n"
msgstr ""
msgid ""
"$ c99 -DFALLIBLE -o leaky leaky.c -lfallible\n"
"$ fallible-check ./leaky\n"
"Valgrind failed when we did not expect it to:\n"
"(...suppressed output...)\n"
"# exit status is 1\n"
msgstr ""
msgid "Conclusion"
msgstr ""
msgid ""
"For my personal use, I'll [package](https://git.euandreh.xyz/package-"
"repository/) them for GNU Guix and Nix. Packaging it to any other "
"distribution should be trivial, or just downloading the tarball and running "
"`[sudo] make install`."
msgstr ""
msgid "Patches welcome!"
msgstr ""
msgid ""
"--- 3.c 2021-02-17 00:15:38.019706074 -0300\n"
"+++ 4.c 2021-02-17 00:44:32.306885590 -0300\n"
"@@ -1,11 +1,11 @@\n"
" bool a_function() {\n"
"- char *s1 = malloc(A_NUMBER);\n"
"+ char *s1 = MALLOC(A_NUMBER);\n"
" if (!s1) {\n"
" return false;\n"
" }\n"
" strcpy(s1, \"some string\");\n"
"\n"
"- char *s2 = malloc(A_NUMBER);\n"
"+ char *s2 = MALLOC(A_NUMBER);\n"
" if (!s2) {\n"
" free(s1);\n"
" return false;\n"
msgstr ""
msgid ""
"Yesterday I pushed v0.1.0 of [fallible](https://euandreh.xyz/fallible/), a "
"miniscule library for fault-injection and stress-testing C programs."
msgstr ""
msgid ""
"Compile with `-DFALLIBLE` and run [`fallible-"
"check.1`](https://euandreh.xyz/fallible/fallible-check.1.html):"
msgstr ""
msgid "updated_at: 2021-02-17"
msgstr ""
msgid "*EDIT*"
msgstr ""
msgid ""
"2021-06-12: As of [0.3.0](https://euandreh.xyz/fallible/CHANGELOG.html) (and"
" beyond), the macro interface improved and is a bit different from what is "
"presented in this article. If you're interested, I encourage you to take a "
"look at it."
msgstr ""
#~ msgid ""
#~ "Yesterday I pushed v0.1.0 of [fallible](https://fallible.euandreh.xyz), a "
#~ "miniscule library for fault-injection and stress-testing C programs."
#~ msgstr ""
#~ msgid ""
#~ "Compile with `-DFALLIBLE` and run [`fallible-"
#~ "check.1`](https:/fallible.euandreh.xyz/fallible-check.1.html):"
#~ msgstr ""
#~ msgid ""
#~ "--- 3.c\t2021-02-17 00:15:38.019706074 -0300\n"
#~ "+++ 4.c\t2021-02-17 00:44:32.306885590 -0300\n"
#~ "@@ -1,11 +1,11 @@\n"
#~ " bool a_function() {\n"
#~ "- char *s1 = malloc(A_NUMBER);\n"
#~ "+ char *s1 = MALLOC(A_NUMBER);\n"
#~ " if (!s1) {\n"
#~ " return false;\n"
#~ " }\n"
#~ " strcpy(s1, \"some string\");\n"
#~ "\n"
#~ "- char *s2 = malloc(A_NUMBER);\n"
#~ "+ char *s2 = MALLOC(A_NUMBER);\n"
#~ " if (!s2) {\n"
#~ " free(s1);\n"
#~ " return false;\n"
#~ msgstr ""