_tils/2020-12-15-awk-snippet-shellcheck-all-scripts-in-a-repository.md


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171

---

title: 'Awk snippet: ShellCheck all scripts in a repository'

date: 2020-12-15

updated_at: 2020-12-16

layout: post

lang: en

ref: awk-snippet-shellcheck-all-scripts-in-a-repository

eu_categories: shell

---

Inspired by Fred Herbert's "[Awk in 20 Minutes][awk-20min]", here's a problem I
just solved with a line of Awk: run ShellCheck in all scripts of a repository.

In my repositories I usually have Bash and POSIX scripts, which I want to keep
tidy with [ShellCheck][shellcheck]. Here's the first version of
`assert-shellcheck.sh`:

```shell
#!/bin/sh -eux

find . -type f -name '*.sh' -print0 | xargs -0 shellcheck
```

This is the type of script that I copy around to all repositories, and I want it
to be capable of working on any repository, without requiring a list of files to
run ShellCheck on.

This first version worked fine, as all my scripts had the '.sh' ending. But I
recently added some scripts without any extension, so `assert-shellcheck.sh`
called for a second version. The first attempt was to try grepping the shebang
line:

```shell
$ grep '^#!/' assert-shellcheck.sh
#!/usr/sh
```

Good, we have a grep pattern on the first try. Let's try to find all the
matching files:

```shell
$ find . -type f | xargs grep -l '^#!/'
./TODOs.org
./.git/hooks/pre-commit.sample
./.git/hooks/pre-push.sample
./.git/hooks/pre-merge-commit.sample
./.git/hooks/fsmonitor-watchman.sample
./.git/hooks/pre-applypatch.sample
./.git/hooks/pre-push
./.git/hooks/prepare-commit-msg.sample
./.git/hooks/commit-msg.sample
./.git/hooks/post-update.sample
./.git/hooks/pre-receive.sample
./.git/hooks/applypatch-msg.sample
./.git/hooks/pre-rebase.sample
./.git/hooks/update.sample
./build-aux/with-guile-env.in
./build-aux/test-driver
./build-aux/missing
./build-aux/install-sh
./build-aux/install-sh~
./bootstrap
./scripts/assert-todos.sh
./scripts/songbooks
./scripts/compile-readme.sh
./scripts/ci-build.sh
./scripts/generate-tasks-and-bugs.sh
./scripts/songbooks.in
./scripts/with-container.sh
./scripts/assert-shellcheck.sh
```

This approach has a problem, though: it includes files ignored by Git, such as
`builld-aux/install-sh~`, and even goes into the `.git/` directory and finds
sample hooks in `.git/hooks/*`.

To list the files that Git is tracking we'll try `git ls-files`:

```shell
$ git ls-files | xargs grep -l '^#!/'
TODOs.org
bootstrap
build-aux/with-guile-env.in
old/scripts/assert-docs-spelling.sh
old/scripts/build-site.sh
old/scripts/builder.bats.sh
scripts/assert-shellcheck.sh
scripts/assert-todos.sh
scripts/ci-build.sh
scripts/compile-readme.sh
scripts/generate-tasks-and-bugs.sh
scripts/songbooks.in
scripts/with-container.sh
```

It looks to be almost there, but the `TODOs.org` entry shows a flaw in it: grep
is looking for a `'^#!/'` pattern on any part of the file. In my case,
`TODOs.org` had a snippet in the middle of the file where a line started with
`#!/bin/sh`.

So what we actually want is to match the **first** line against the pattern. We
could loop through each file, get the first line with `head -n 1` and grep
against that, but this is starting to look messy. I bet there is another way of
doing it concisely...

Let's try Awk. I need a way to select the line numbers to replace `head -n 1`,
and to stop processing the file if the pattern matches. A quick search points me
to using `FNR` for the former, and `{ nextline }` for the latter. Let's try it:

```shell
$ git ls-files | xargs awk 'FNR>1 { nextfile } /^#!\// { print FILENAME; nextfile }'
bootstrap
build-aux/with-guile-env.in
old/scripts/assert-docs-spelling.sh
old/scripts/build-site.sh
old/scripts/builder.bats.sh
scripts/assert-shellcheck.sh
scripts/assert-todos.sh
scripts/ci-build.sh
scripts/compile-readme.sh
scripts/generate-tasks-and-bugs.sh
scripts/songbooks.in
scripts/with-container.sh
```

Great! Only `TODOs.org` is missing, but the script is much better: instead of
matching against any part of the file that may have a shebang-like line, we only
look for the first. Let's put it back into the `assert-shellcheck.sh` file and
use `NULL` for separators to accommodate files with spaces in the name:

```
#!/usr/sh -eux

git ls-files -z | \
  xargs -0 awk 'FNR>1 { nextfile } /^#!\// { print FILENAME; nextfile }' | \
  xargs shellcheck
```

This is where I've stopped, but I imagine a likely improvement: match against
only `#!/bin/sh` and `#!/usr/bin/env bash` shebangs (the ones I use most), to
avoid running ShellCheck on Perl files, or other shebangs.

Also when reviewing the text of this article, I found that `{ nextfile }` is a
GNU Awk extension. It would be an improvement if `assert-shellcheck.sh` relied
on the POSIX subset of Awk for working correctly.

## *Update*

After publishing, I could remove `{ nextfile }` and even make the script
simpler:

```shell
#!/usr/sh -eux

git ls-files -z | \
  xargs -0 awk 'FNR==1 && /^#!\// { print FILENAME }' | \
  xargs shellcheck
```

Now both the shell and Awk usage are POSIX compatible.

[awk-20min]: https://ferd.ca/awk-in-20-minutes.html
[shellcheck]: https://www.shellcheck.net/