aboutsummaryrefslogblamecommitdiff
path: root/_tils/2020-12-15-awk-snippet-shellcheck-all-scripts-in-a-repository.md
blob: 2b434b40221bc19e749b6388fb08979b7922d774 (plain) (tree)























































































































































                                                                                    















                                                                        

                                                   
---

title: 'Awk snippet: ShellCheck all scripts in a repository'

date: 2020-12-15

layout: post

lang: en

ref: awk-snippet-shellcheck-all-scripts-in-a-repository

---

Inspired by Fred Herbert's "[Awk in 20 Minutes][awk-20min]", here's a problem I
just solved with a line of Awk: run ShellCheck in all scripts of a repository.

In my repositories I usually have Bash and POSIX scripts, which I want to keep
tidy with [ShellCheck][shellcheck]. Here's the first version of
`assert-shellcheck.sh`:

```shell
#!/bin/sh
set -eu

find . -type f -name '*.sh' -print0 | xargs -0 shellcheck
```

This is the type of script that I copy around to all repositories, and I want it
to be capable of working on any repository, without requiring a list of files to
run ShellCheck on.

This first version worked fine, as all my scripts had the '.sh' ending. But I
recently added some scripts without any extension, so `assert-shellcheck.sh`
called for a second version. The first attempt was to try grepping the shebang
line:

```shell
$ grep '^#!/' assert-shellcheck.sh
#!/usr/sh
```

Good, we have a grep pattern on the first try. Let's try to find all the
matching files:

```shell
$ find . -type f | xargs grep -l '^#!/'
./TODOs.org
./.git/hooks/pre-commit.sample
./.git/hooks/pre-push.sample
./.git/hooks/pre-merge-commit.sample
./.git/hooks/fsmonitor-watchman.sample
./.git/hooks/pre-applypatch.sample
./.git/hooks/pre-push
./.git/hooks/prepare-commit-msg.sample
./.git/hooks/commit-msg.sample
./.git/hooks/post-update.sample
./.git/hooks/pre-receive.sample
./.git/hooks/applypatch-msg.sample
./.git/hooks/pre-rebase.sample
./.git/hooks/update.sample
./build-aux/with-guile-env.in
./build-aux/test-driver
./build-aux/missing
./build-aux/install-sh
./build-aux/install-sh~
./bootstrap
./scripts/assert-todos.sh
./scripts/songbooks
./scripts/compile-readme.sh
./scripts/ci-build.sh
./scripts/generate-tasks-and-bugs.sh
./scripts/songbooks.in
./scripts/with-container.sh
./scripts/assert-shellcheck.sh
```

This approach has a problem, though: it includes files ignored by Git, such as
`builld-aux/install-sh~`, and even goes into the `.git/` directory and finds
sample hooks in `.git/hooks/*`.

To list the files that Git is tracking we'll try `git ls-files`:

```shell
$ git ls-files | xargs grep -l '^#!/'
TODOs.org
bootstrap
build-aux/with-guile-env.in
old/scripts/assert-docs-spelling.sh
old/scripts/build-site.sh
old/scripts/builder.bats.sh
scripts/assert-shellcheck.sh
scripts/assert-todos.sh
scripts/ci-build.sh
scripts/compile-readme.sh
scripts/generate-tasks-and-bugs.sh
scripts/songbooks.in
scripts/with-container.sh
```

It looks to be almost there, but the `TODOs.org` entry shows a flaw in it: grep
is looking for a `'^#!/'` pattern on any part of the file. In my case,
`TODOs.org` had a snippet in the middle of the file where a line started with
`#!/bin/sh`.

So what we actually want is to match the **first** line against the pattern. We
could loop through each file, get the first line with `head -n 1` and grep
against that, but this is starting to look messy. I bet there is another way of
doing it concisely...

Let's try Awk. I need a way to select the line numbers to replace `head -n 1`,
and to stop processing the file if the pattern matches. A quick search points me
to using `FNR` for the former, and `{ nextline }` for the latter. Let's try it:

```shell
$ git ls-files | xargs awk 'FNR>1 { nextfile } /^#!\// { print FILENAME; nextfile }'
bootstrap
build-aux/with-guile-env.in
old/scripts/assert-docs-spelling.sh
old/scripts/build-site.sh
old/scripts/builder.bats.sh
scripts/assert-shellcheck.sh
scripts/assert-todos.sh
scripts/ci-build.sh
scripts/compile-readme.sh
scripts/generate-tasks-and-bugs.sh
scripts/songbooks.in
scripts/with-container.sh
```

Great! Only `TODOs.org` is missing, but the script is much better: instead of
matching against any part of the file that may have a shebang-like line, we only
look for the first. Let's put it back into the `assert-shellcheck.sh` file and
use `NULL` for separators to accommodate files with spaces in the name:

```
#!/usr/sh
set -eu

git ls-files -z | \
  xargs -0 awk 'FNR>1 { nextfile } /^#!\// { print FILENAME; nextfile }' | \
  xargs shellcheck
```

This is where I've stopped, but I imagine a likely improvement: match against
only `#!/bin/sh` and `#!/usr/bin/env bash` shebangs (the ones I use most), to
avoid running ShellCheck on Perl files, or other shebangs.

Also when reviewing the text of this article, I found that `{ nextfile }` is a
GNU Awk extension. It would be an improvement if `assert-shellcheck.sh` relied
on the POSIX subset of Awk for working correctly.

## *Update*

After publishing, I could remove `{ nextfile }` and even make the script
simpler:

```shell
#!/usr/sh
set -eu

git ls-files -z | \
  xargs -0 awk 'FNR==1 && /^#!\// { print FILENAME }' | \
  xargs shellcheck
```

Now both the shell and Awk usage are POSIX compatible.

[awk-20min]: https://ferd.ca/awk-in-20-minutes.html
[shellcheck]: https://www.shellcheck.net/