aboutsummaryrefslogtreecommitdiff
---

title: Grep online repositories

date: 2020-08-28

layout: post

lang: en

ref: grep-online-repositories

eu_categories: git

---

I often find interesting source code repositories online that I want to grep for
some pattern but I can't, because either:

- the repository is on [cgit][cgit] or a similar code repository that doesn't
 allow search in files, or;
- the search function is really bad, and doesn't allow me to use regular expressions for searching patterns in the code.

[cgit]: https://git.zx2c4.com/cgit/

Here's a simple script that allows you to overcome that problem easily:

```shell
#!/usr/bin/env bash
set -eu

end="\033[0m"
red="\033[0;31m"
red() { echo -e "${red}${1}${end}"; }

usage() {
  red "Missing argument $1.\n"
  cat <<EOF
Usage:
    $0 <REGEX_PATTERN> <REPOSITORY_URL>

      Arguments:
        REGEX_PATTERN     Regular expression that "git grep" can search
        REPOSITORY_URL    URL address that "git clone" can download the repository from

Examples:
    Searching "make get-git" in cgit repository:
        git search 'make get-git' https://git.zx2c4.com/cgit/
        git search 'make get-git' https://git.zx2c4.com/cgit/ -- \$(git rev-list --all)
EOF
  exit 2
}


REGEX_PATTERN="${1:-}"
REPOSITORY_URL="${2:-}"
[[ -z "${REGEX_PATTERN}" ]] && usage 'REGEX_PATTERN'
[[ -z "${REPOSITORY_URL}" ]] && usage 'REPOSITORY_URL'

mkdir -p /tmp/git-search
DIRNAME="$(echo "${REPOSITORY_URL%/}" | rev | cut -d/ -f1 | rev)"
if [[ ! -d "/tmp/git-search/${DIRNAME}" ]]; then
  git clone "${REPOSITORY_URL}" "/tmp/git-search/${DIRNAME}"
fi
pushd "/tmp/git-search/${DIRNAME}"

shift 3 || shift 2 # when "--" is missing
git grep "${REGEX_PATTERN}" "${@}"
```

It is a wrapper around `git grep` that downloads the repository when missing.
Save in a file called `git-search`, make the file executable and add it to your
path.

Overview:

- *lines 1~2*:

  Bash shebang and the `set -eu` options to exit on error or undefined
  variables.

- *lines 4~30*:

  Usage text to be printed when providing less arguments than expected.

- *line 33*:

  Extract the repository name from the URL, removing trailing slashes.

- *lines 34~37*:

  Download the repository when missing and go to the folder.

- *line 39*:

  Make the variable `$@` contain the rest of the unused arguments.

- *line 40*:

  Perform `git grep`, forwarding the remaining arguments from `$@`.

Example output:
```shell
$ git search 'make get-git' https://git.zx2c4.com/cgit/
Clonage dans '/tmp/git-search/cgit'...
remote: Enumerating objects: 542, done.
remote: Counting objects: 100% (542/542), done.
remote: Compressing objects: 100% (101/101), done.
warning: object 51dd1eff1edc663674df9ab85d2786a40f7ae3a5: gitmodulesParse: could not parse gitmodules blob
remote: Total 7063 (delta 496), reused 446 (delta 441), pack-reused 6521
Réception d'objets: 100% (7063/7063), 8.69 Mio | 5.39 Mio/s, fait.
Résolution des deltas: 100% (5047/5047), fait.
/tmp/git-search/cgit ~/dev/libre/songbooks/docs
README:    $ make get-git

$ git search 'make get-git' https://git.zx2c4.com/cgit/
/tmp/git-search/cgit ~/dev/libre/songbooks/docs
README:    $ make get-git
```

Subsequent greps on the same repository are faster because no download is needed.

When no argument is provided, it prints the usage text:
```shell
$ git search
Missing argument REGEX_PATTERN.

Usage:
    /home/andreh/dev/libre/dotfiles/scripts/ad-hoc/git-search <REGEX_PATTERN> <REPOSITORY_URL>

      Arguments:
        REGEX_PATTERN     Regular expression that "git grep" can search
        REPOSITORY_URL    URL address that "git clone" can download the repository from

Examples:
    Searching "make get-git" in cgit repository:
        git search 'make get-git' https://git.zx2c4.com/cgit/
        git search 'make get-git' https://git.zx2c4.com/cgit/ -- $(git rev-list --all)
```