Grep online repositories

Posted on August 28, 2020

I often find interesting source code repositories online that I want to grep for some pattern but I can’t, because either:

  • the repository is on cgit or a similar code repository that doesn’t allow search in files, or;
  • the search function is really bad, and doesn’t allow me to use regular expressions for searching patterns in the code.

Here’s a simple script that allows you to overcome that problem easily:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
#!/usr/bin/env bash
set -eu

end="\033[0m"
red="\033[0;31m"
red() { echo -e "${red}${1}${end}"; }

usage() {
  red "Missing argument $1.\n"
  cat <<EOF
Usage:
    $0 <REGEX_PATTERN> <REPOSITORY_URL>

      Arguments:
        REGEX_PATTERN     Regular expression that "git grep" can search
        REPOSITORY_URL    URL address that "git clone" can download the repository from

Examples:
    Searching "make get-git" in cgit repository:
        git search 'make get-git' https://git.zx2c4.com/cgit/
        git search 'make get-git' https://git.zx2c4.com/cgit/ -- \$(git rev-list --all)
EOF
  exit 2
}


REGEX_PATTERN="${1:-}"
REPOSITORY_URL="${2:-}"
[[ -z "${REGEX_PATTERN}" ]] && usage 'REGEX_PATTERN'
[[ -z "${REPOSITORY_URL}" ]] && usage 'REPOSITORY_URL'

mkdir -p /tmp/git-search
DIRNAME="$(echo "${REPOSITORY_URL%/}" | rev | cut -d/ -f1 | rev)"
if [[ ! -d "/tmp/git-search/${DIRNAME}" ]]; then
  git clone "${REPOSITORY_URL}" "/tmp/git-search/${DIRNAME}"
fi
pushd "/tmp/git-search/${DIRNAME}"

shift 3 || shift 2 # when "--" is missing
git grep "${REGEX_PATTERN}" "${@}"

It is a wrapper around git grep that downloads the repository when missing. Save in a file called git-search, make the file executable and add it to your path.

Overview:

  • lines 1~2:

    Bash shebang and the set -eu options to exit on error or undefined variables.

  • lines 4~30:

    Usage text to be printed when providing less arguments than expected.

  • line 33:

    Extract the repository name from the URL, removing trailing slashes.

  • lines 34~37:

    Download the repository when missing and go to the folder.

  • line 39:

    Make the variable $@ contain the rest of the unused arguments.

  • line 40:

    Perform git grep, forwarding the remaining arguments from $@.

Example output:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
$ git search 'make get-git' https://git.zx2c4.com/cgit/
Clonage dans '/tmp/git-search/cgit'...
remote: Enumerating objects: 542, done.
remote: Counting objects: 100% (542/542), done.
remote: Compressing objects: 100% (101/101), done.
warning: object 51dd1eff1edc663674df9ab85d2786a40f7ae3a5: gitmodulesParse: could not parse gitmodules blob
remote: Total 7063 (delta 496), reused 446 (delta 441), pack-reused 6521
Réception d'objets: 100% (7063/7063), 8.69 Mio | 5.39 Mio/s, fait.
Résolution des deltas: 100% (5047/5047), fait.
/tmp/git-search/cgit ~/dev/libre/songbooks/docs
README:    $ make get-git

$ git search 'make get-git' https://git.zx2c4.com/cgit/
/tmp/git-search/cgit ~/dev/libre/songbooks/docs
README:    $ make get-git

Subsequent greps on the same repository are faster because no download is needed.

When no argument is provided, it prints the usage text:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
$ git search
Missing argument REGEX_PATTERN.

Usage:
    /home/andreh/dev/libre/dotfiles/scripts/ad-hoc/git-search <REGEX_PATTERN> <REPOSITORY_URL>

      Arguments:
        REGEX_PATTERN     Regular expression that "git grep" can search
        REPOSITORY_URL    URL address that "git clone" can download the repository from

Examples:
    Searching "make get-git" in cgit repository:
        git search 'make get-git' https://git.zx2c4.com/cgit/
        git search 'make get-git' https://git.zx2c4.com/cgit/ -- $(git rev-list --all)