aboutsummaryrefslogtreecommitdiff
path: root/_tils
diff options
context:
space:
mode:
Diffstat (limited to '')
-rw-r--r--_tils/2020-12-15-awk-snippet-shellcheck-all-scripts-in-a-repository.md154
1 files changed, 154 insertions, 0 deletions
diff --git a/_tils/2020-12-15-awk-snippet-shellcheck-all-scripts-in-a-repository.md b/_tils/2020-12-15-awk-snippet-shellcheck-all-scripts-in-a-repository.md
new file mode 100644
index 0000000..91ab22e
--- /dev/null
+++ b/_tils/2020-12-15-awk-snippet-shellcheck-all-scripts-in-a-repository.md
@@ -0,0 +1,154 @@
+---
+
+title: 'Awk snippet: ShellCheck all scripts in a repository'
+
+date: 2020-12-15
+
+layout: post
+
+lang: en
+
+ref: awk-snippet-shellcheck-all-scripts-in-a-repository
+
+---
+
+Inspired by Fred Herbert's "[Awk in 20 Minutes][awk-20min]", here's a problem I
+just solved with a line of Awk: run ShellCheck in all scripts of a repository.
+
+In my repositories I usually have Bash and POSIX scripts, which I want to keep
+tidy with [ShellCheck][shellcheck]. Here's the first version of
+`assert-shellcheck.sh`:
+
+```shell
+#!/bin/sh
+set -eu
+
+find . -type f -name '*.sh' -print0 | xargs -0 shellcheck
+```
+
+This is the type of script that I copy around to all repositories, and I want it
+to be capable of working on any repository, without requiring a list of files to
+run ShellCheck on.
+
+This first version worked fine, as all my scripts had the '.sh' ending. But I
+recently added some scripts without any extension, so `assert-shellcheck.sh`
+called for a second version. The first attempt was to try grepping the shebang
+line:
+
+```shell
+$ grep '^#!/' assert-shellcheck.sh
+#!/usr/sh
+```
+
+Good, we have a grep pattern on the first try. Let's try to find all the
+matching files:
+
+```shell
+$ find . -type f | xargs grep -l '^#!/'
+./TODOs.org
+./.git/hooks/pre-commit.sample
+./.git/hooks/pre-push.sample
+./.git/hooks/pre-merge-commit.sample
+./.git/hooks/fsmonitor-watchman.sample
+./.git/hooks/pre-applypatch.sample
+./.git/hooks/pre-push
+./.git/hooks/prepare-commit-msg.sample
+./.git/hooks/commit-msg.sample
+./.git/hooks/post-update.sample
+./.git/hooks/pre-receive.sample
+./.git/hooks/applypatch-msg.sample
+./.git/hooks/pre-rebase.sample
+./.git/hooks/update.sample
+./build-aux/with-guile-env.in
+./build-aux/test-driver
+./build-aux/missing
+./build-aux/install-sh
+./build-aux/install-sh~
+./bootstrap
+./scripts/assert-todos.sh
+./scripts/songbooks
+./scripts/compile-readme.sh
+./scripts/ci-build.sh
+./scripts/generate-tasks-and-bugs.sh
+./scripts/songbooks.in
+./scripts/with-container.sh
+./scripts/assert-shellcheck.sh
+```
+
+This approach has a problem, though: it includes files ignored by Git, such as
+`builld-aux/install-sh~`, and even goes into the `.git/` directory and finds
+sample hooks in `.git/hooks/*`.
+
+To list the files that Git is tracking we'll try `git ls-files`:
+
+```shell
+$ git ls-files | xargs grep -l '^#!/'
+TODOs.org
+bootstrap
+build-aux/with-guile-env.in
+old/scripts/assert-docs-spelling.sh
+old/scripts/build-site.sh
+old/scripts/builder.bats.sh
+scripts/assert-shellcheck.sh
+scripts/assert-todos.sh
+scripts/ci-build.sh
+scripts/compile-readme.sh
+scripts/generate-tasks-and-bugs.sh
+scripts/songbooks.in
+scripts/with-container.sh
+```
+
+It looks to be almost there, but the `TODOs.org` entry shows a flaw in it: grep
+is looking for a `'^#!/'` pattern on any part of the file. In my case,
+`TODOs.org` had a snippet in the middle of the file where a line started with
+`#!/bin/sh`.
+
+So what we actually want is to match the **first** line against the pattern. We
+could loop through each file, get the first line with `head -n 1` and grep
+against that, but this is starting to look messy. I bet there is another way of
+doing it concisely...
+
+Let's try Awk. I need a way to select the line numbers to replace `head -n 1`,
+and to stop processing the file if the pattern matches. A quick search points me
+to using `FNR` for the former, and `{ nextline }` for the latter. Let's try it:
+
+```shell
+$ git ls-files | xargs awk 'FNR>1 { nextfile } /^#!\// { print FILENAME; nextfile }'
+bootstrap
+build-aux/with-guile-env.in
+old/scripts/assert-docs-spelling.sh
+old/scripts/build-site.sh
+old/scripts/builder.bats.sh
+scripts/assert-shellcheck.sh
+scripts/assert-todos.sh
+scripts/ci-build.sh
+scripts/compile-readme.sh
+scripts/generate-tasks-and-bugs.sh
+scripts/songbooks.in
+scripts/with-container.sh
+```
+
+Great! Only `TODOs.org` is missing, but the script is much better: instead of
+matching against any part of the file that may have a shebang-like line, we only
+look for the first. Let's put it back into the `assert-shellcheck.sh` file and
+use `NULL` for separators to accommodate files with spaces in the name:
+
+```
+#!/usr/sh
+set -eu
+
+git ls-files -z | \
+ xargs -0 awk 'FNR>1 { nextfile } /^#!\// { print FILENAME; nextfile }' | \
+ xargs shellcheck
+```
+
+This is where I've stopped, but I imagine a likely improvement: match against
+only `#!/bin/sh` and `#!/usr/bin/env bash` shebangs (the ones I use most), to
+avoid running ShellCheck on Perl files, or other shebangs.
+
+Also when reviewing the text of this article, I found that `{ nextfile }` is a
+GNU Awk extension. It would be an improvement if `assert-shellcheck.sh` relied
+on the POSIX subset of Awk for working correctly.
+
+[awk-20min]: https://ferd.ca/awk-in-20-minutes.html
+[shellcheck]: https://www.shellcheck.net/