# msgid "" msgstr "" msgid "" "title: \"ANN: fallible - Fault injection library for stress-testing failure " "scenarios\"" msgstr "" msgid "date: 2021-02-17" msgstr "" msgid "layout: post" msgstr "" msgid "lang: en" msgstr "" msgid "" "ref: ann-fallible-fault-injection-library-for-stress-testing-failure-" "scenarios" msgstr "" msgid "" "Yesterday I pushed v0.1.0 of [fallible](https://fallible.euandreh.xyz), a " "miniscule library for fault-injection and stress-testing C programs." msgstr "" msgid "Existing solutions" msgstr "" msgid "" "Writing robust code can be challenging, and tools like static analyzers, " "fuzzers and friends can help you get there with more certainty. As I would " "try to improve some of my C code and make it more robust, in order to handle" " system crashes, filled disks, out-of-memory and similar scenarios, I didn't" " find existing tooling to help me get there as I expected to find. I " "couldn't find existing tools to help me explicitly stress-test those failure" " scenarios." msgstr "" msgid "" "Take the \"[Writing Robust " "Programs](https://www.gnu.org/prep/standards/standards.html#Semantics)\" " "section of the GNU Coding Standards:" msgstr "" msgid "" "Check every system call for an error return, unless you know you wish to " "ignore errors. (...) Check every call to malloc or realloc to see if it " "returned NULL." msgstr "" msgid "" "From a robustness standpoint, this is a reasonable stance: if you want to " "have a robust program that knows how to fail when you're out of memory and " "`malloc` returns `NULL`, than you ought to check every call to `malloc`." msgstr "" msgid "Take a sample code snippet for clarity:" msgstr "" msgid "" "void a_function() {\n" " char *s1 = malloc(A_NUMBER);\n" " strcpy(s1, \"some string\");\n" "\n" " char *s2 = malloc(A_NUMBER);\n" " strcpy(s2, \"another string\");\n" "}\n" msgstr "" msgid "" "At a first glance, this code is unsafe: if any of the calls to `malloc` " "returns `NULL`, `strcpy` will be given a `NULL` pointer." msgstr "" msgid "My first instinct was to change this code to something like this:" msgstr "" msgid "" "@@ -1,7 +1,15 @@\n" " void a_function() {\n" " char *s1 = malloc(A_NUMBER);\n" "+ if (!s1) {\n" "+ fprintf(stderr, \"out of memory, exitting\\n\");\n" "+ exit(1);\n" "+ }\n" " strcpy(s1, \"some string\");\n" "\n" " char *s2 = malloc(A_NUMBER);\n" "+ if (!s2) {\n" "+ fprintf(stderr, \"out of memory, exitting\\n\");\n" "+ exit(1);\n" "+ }\n" " strcpy(s2, \"another string\");\n" " }\n" msgstr "" msgid "" "As I later found out, there are at least 2 problems with this approach:" msgstr "" msgid "" "**it doesn't compose**: this could arguably work if `a_function` was `main`." " But if `a_function` lives inside a library, an `exit(1);` is a inelegant " "way of handling failures, and will catch the top-level `main` consuming the " "library by surprise;" msgstr "" msgid "" "**it gives up instead of handling failures**: the actual handling goes a bit" " beyond stopping. What about open file handles, in-memory caches, unflushed " "bytes, etc.?" msgstr "" msgid "" "If you could force only the second call to `malloc` to fail, " "[Valgrind](https://www.valgrind.org/) would correctly complain that the " "program exitted with unfreed memory." msgstr "" msgid "So the last change to make the best version of the above code is:" msgstr "" msgid "" "@@ -1,15 +1,14 @@\n" "-void a_function() {\n" "+bool a_function() {\n" " char *s1 = malloc(A_NUMBER);\n" " if (!s1) {\n" "- fprintf(stderr, \"out of memory, exitting\\n\");\n" "- exit(1);\n" "+ return false;\n" " }\n" " strcpy(s1, \"some string\");\n" "\n" " char *s2 = malloc(A_NUMBER);\n" " if (!s2) {\n" "- fprintf(stderr, \"out of memory, exitting\\n\");\n" "- exit(1);\n" "+ free(s1);\n" "+ return false;\n" " }\n" " strcpy(s2, \"another string\");\n" " }\n" msgstr "" msgid "" "Instead of returning `void`, `a_function` now returns `bool` to indicate " "whether an error ocurred during its execution. If `a_function` returned a " "pointer to something, the return value could be `NULL`, or an `int` that " "represents an error code." msgstr "" msgid "" "The code is now a) safe and b) failing gracefully, returning the control to " "the caller to properly handle the error case." msgstr "" msgid "" "After seeing similar patterns on well designed APIs, I adopted this practice" " for my own code, but was still left with manually verifying the correctness" " and robustness of it." msgstr "" msgid "" "How could I add assertions around my code that would help me make sure the " "`free(s1);` exists, before getting an error report? How do other people and " "projects solve this?" msgstr "" msgid "" "From what I could see, either people a) hope for the best, b) write safe " "code but don't strees-test it or c) write ad-hoc code to stress it." msgstr "" msgid "" "The most proeminent case of c) is SQLite: it has a few wrappers around the " "familiar `malloc` to do fault injection, check for memory limits, add " "warnings, create shim layers for other environments, etc. All of that, " "however, is tightly couple with SQLite itself, and couldn't be easily pulled" " off for using somewhere else." msgstr "" msgid "" "When searching for it online, an [interesting " "thread](https://stackoverflow.com/questions/1711170/unit-testing-for-failed-" "malloc) caught my atention: fail the call to `malloc` for each time it is " "called, and when the same stacktrace appears again, allow it to proceed." msgstr "" msgid "Implementation" msgstr "" msgid "" "A working implementation of that already exists: " "[mallocfail](https://github.com/ralight/mallocfail). It uses `LD_PRELOAD` to" " replace `malloc` at run-time, computes the SHA of the stacktrace and fails " "once for each SHA." msgstr "" msgid "" "I initially envisioned and started implementing something very similar to " "mallocfail. However I wanted it to go beyond out-of-memory scenarios, and " "using `LD_PRELOAD` for every possible corner that could fail wasn't a good " "idea on the long run." msgstr "" msgid "" "Also, mallocfail won't work together with tools such as Valgrind, who want " "to do their own override of `malloc` with `LD_PRELOAD`." msgstr "" msgid "" "I instead went with less automatic things: starting with a " "`fallible_should_fail(char *filename, int lineno)` function that fails once " "for each `filename`+`lineno` combination, I created macro wrappers around " "common functions such as `malloc`:" msgstr "" msgid "" "void *fallible_malloc(size_t size, const char *const filename, int lineno) {\n" "#ifdef FALLIBLE\n" " if (fallible_should_fail(filename, lineno)) {\n" " return NULL;\n" " }\n" "#else\n" " (void)filename;\n" " (void)lineno;\n" "#endif\n" " return malloc(size);\n" "}\n" "\n" "#define MALLOC(size) fallible_malloc(size, __FILE__, __LINE__)\n" msgstr "" msgid "" "With this definition, I could replace the calls to `malloc` with `MALLOC` " "(or any other name that you want to `#define`):" msgstr "" msgid "" "--- 3.c\t2021-02-17 00:15:38.019706074 -0300\n" "+++ 4.c\t2021-02-17 00:44:32.306885590 -0300\n" "@@ -1,11 +1,11 @@\n" " bool a_function() {\n" "- char *s1 = malloc(A_NUMBER);\n" "+ char *s1 = MALLOC(A_NUMBER);\n" " if (!s1) {\n" " return false;\n" " }\n" " strcpy(s1, \"some string\");\n" "\n" "- char *s2 = malloc(A_NUMBER);\n" "+ char *s2 = MALLOC(A_NUMBER);\n" " if (!s2) {\n" " free(s1);\n" " return false;\n" msgstr "" msgid "" "With this change, if the program gets compiled with the `-DFALLIBLE` flag " "the fault-injection mechanism will run, and `MALLOC` will fail once for each" " `filename`+`lineno` combination. When the flag is missing, `MALLOC` is a " "very thin wrapper around `malloc`, which compilers could remove entirely, " "and the `-lfallible` flags can be omitted." msgstr "" msgid "" "This applies not only to `malloc` or other `stdlib.h` functions. If " "`a_function` is important or relevant, I could add a wrapper around it too, " "that checks if `fallible_should_fail` to exercise if its callers are also " "doing the proper clean-up." msgstr "" msgid "" "The actual code is just this single function, " "[`fallible_should_fail`](https://git.euandreh.xyz/fallible/tree/src/fallible.c?id=v0.1.0#n16)," " which ended-up taking only ~40 lines. In fact, there are more lines of " "either Makefile (111), README.md (82) or troff (306) on this first version." msgstr "" msgid "" "The price for such fine-grained control is that this approach requires more " "manual work." msgstr "" msgid "Usage examples" msgstr "" msgid "`MALLOC` from the `README.md`" msgstr "" msgid "" "// leaky.c\n" "#include \n" "#include \n" "\n" "int main() {\n" " char *aaa = MALLOC(100);\n" " if (!aaa) {\n" " return 1;\n" " }\n" " strcpy(aaa, \"a safe use of strcpy\");\n" "\n" " char *bbb = MALLOC(100);\n" " if (!bbb) {\n" " // free(aaa);\n" " return 1;\n" " }\n" " strcpy(bbb, \"not unsafe, but aaa is leaking\");\n" "\n" " free(bbb);\n" " free(aaa);\n" " return 0;\n" "}\n" msgstr "" msgid "" "Compile with `-DFALLIBLE` and run [`fallible-" "check.1`](https:/fallible.euandreh.xyz/fallible-check.1.html):" msgstr "" msgid "" "$ c99 -DFALLIBLE -o leaky leaky.c -lfallible\n" "$ fallible-check ./leaky\n" "Valgrind failed when we did not expect it to:\n" "(...suppressed output...)\n" "# exit status is 1\n" msgstr "" msgid "Conclusion" msgstr "" msgid "" "For my personal use, I'll [package](https://git.euandreh.xyz/package-" "repository/about/) them for GNU Guix and Nix. Packaging it to any other " "distribution should be trivial, or just downloading the tarball and running " "`[sudo] make install`." msgstr "" msgid "Patches welcome!" msgstr ""