= Awk snippet: ShellCheck all scripts in a repository
:categories: shell
:updatedat: 2020-12-16

:awk-20-min: https://ferd.ca/awk-in-20-minutes.html
:shellcheck: https://www.shellcheck.net/

Inspired by Fred Herbert's "{awk-20-min}[Awk in 20 Minutes]", here's a problem I
just solved with a line of Awk: run ShellCheck in all scripts of a repository.

In my repositories I usually have Bash and POSIX scripts, which I want to keep
tidy with {shellcheck}[ShellCheck].  Here's the first version of
`assert-shellcheck.sh`:

[source,shell]
----
#!/bin/sh -eux

find . -type f -name '*.sh' -print0 | xargs -0 shellcheck
----

This is the type of script that I copy around to all repositories, and I want it
to be capable of working on any repository, without requiring a list of files to
run ShellCheck on.

This first version worked fine, as all my scripts had the `.sh' ending.  But I
recently added some scripts without any extension, so `assert-shellcheck.sh`
called for a second version.  The first attempt was to try grepping the shebang
line:

[source,shell]
----
$ grep '^#!/' assert-shellcheck.sh
#!/usr/sh
----

Good, we have a grep pattern on the first try.  Let's try to find all the
matching files:

[source,shell]
----
$ find . -type f | xargs grep -l '^#!/'
./TODOs.org
./.git/hooks/pre-commit.sample
./.git/hooks/pre-push.sample
./.git/hooks/pre-merge-commit.sample
./.git/hooks/fsmonitor-watchman.sample
./.git/hooks/pre-applypatch.sample
./.git/hooks/pre-push
./.git/hooks/prepare-commit-msg.sample
./.git/hooks/commit-msg.sample
./.git/hooks/post-update.sample
./.git/hooks/pre-receive.sample
./.git/hooks/applypatch-msg.sample
./.git/hooks/pre-rebase.sample
./.git/hooks/update.sample
./build-aux/with-guile-env.in
./build-aux/test-driver
./build-aux/missing
./build-aux/install-sh
./build-aux/install-sh~
./bootstrap
./scripts/assert-todos.sh
./scripts/songbooks
./scripts/compile-readme.sh
./scripts/ci-build.sh
./scripts/generate-tasks-and-bugs.sh
./scripts/songbooks.in
./scripts/with-container.sh
./scripts/assert-shellcheck.sh
----

This approach has a problem, though: it includes files ignored by Git, such as
`builld-aux/install-sh~`, and even goes into the `.git/` directory and finds
sample hooks in `.git/hooks/*`.

To list the files that Git is tracking we'll try `git ls-files`:

[source,shell]
----
$ git ls-files | xargs grep -l '^#!/'
TODOs.org
bootstrap
build-aux/with-guile-env.in
old/scripts/assert-docs-spelling.sh
old/scripts/build-site.sh
old/scripts/builder.bats.sh
scripts/assert-shellcheck.sh
scripts/assert-todos.sh
scripts/ci-build.sh
scripts/compile-readme.sh
scripts/generate-tasks-and-bugs.sh
scripts/songbooks.in
scripts/with-container.sh
----

It looks to be almost there, but the `TODOs.org` entry shows a flaw in it: grep
is looking for a +'^#!/'+ pattern on any part of the file.  In my case,
`TODOs.org` had a snippet in the middle of the file where a line started with
+#!/bin/sh+.

So what we actually want is to match the *first* line against the pattern.  We
could loop through each file, get the first line with `head -n 1` and grep
against that, but this is starting to look messy.  I bet there is another way of
doing it concisely...

Let's try Awk.  I need a way to select the line numbers to replace `head -n 1`,
and to stop processing the file if the pattern matches.  A quick search points
me to using `FNR` for the former, and `{ nextline }` for the latter.  Let's try
it:

[source,shell]
----
$ git ls-files | xargs awk 'FNR>1 { nextfile } /^#!\// { print FILENAME; nextfile }'
bootstrap
build-aux/with-guile-env.in
old/scripts/assert-docs-spelling.sh
old/scripts/build-site.sh
old/scripts/builder.bats.sh
scripts/assert-shellcheck.sh
scripts/assert-todos.sh
scripts/ci-build.sh
scripts/compile-readme.sh
scripts/generate-tasks-and-bugs.sh
scripts/songbooks.in
scripts/with-container.sh
----

Great!  Only `TODOs.org` is missing, but the script is much better: instead of
matching against any part of the file that may have a shebang-like line, we only
look for the first.  Let's put it back into the `assert-shellcheck.sh` file and
use `NULL` for separators to accommodate files with spaces in the name:

....
#!/usr/sh -eux

git ls-files -z | \
  xargs -0 awk 'FNR>1 { nextfile } /^#!\// { print FILENAME; nextfile }' | \
  xargs shellcheck
....

This is where I've stopped, but I imagine a likely improvement: match against
only +#!/bin/sh+ and +#!/usr/bin/env bash+ shebangs (the ones I use most), to
avoid running ShellCheck on Perl files, or other shebangs.

Also when reviewing the text of this article, I found that `{ nextfile }` is a
GNU Awk extension.  It would be an improvement if `assert-shellcheck.sh` relied
on the POSIX subset of Awk for working correctly.

== _Update_

After publishing, I could remove `{ nextfile }` and even make the script
simpler:

[source,shell]
----
#!/usr/sh -eux

git ls-files -z | \
  xargs -0 awk 'FNR==1 && /^#!\// { print FILENAME }' | \
  xargs shellcheck
----

Now both the shell and Awk usage are POSIX compatible.