Bash: a workflow for developing scripts

REPL-driven development

Like in any programming environment, developing larger chunks of code follows the common cycle of making a change, running your script and observing the result, and finally repeating the process. Shell scripts aren't different in this regard.

One important difference however is that since fundamentally the shell is a highly interactive environment, we can easily achieve a nice REPL experience by tweaking a few things in the code we are working on and adding a few definitions to our interactive environment.

Outline of our plan

In broad terms, given a set of files, we want to reload all of the ones that have changed since we last ran a command.

To achieve this, we:

  • will keep track of each file's sha1 checksum,
  • after running a bash command, we check whether the checksums for any file have changed,
  • for each file with a changed checksum, we invoke source

A REPL session

Here is an example session of using our REPL tooling, before diving into how it works:

 0 12:43:43 ~/demo
$ cat test.sh
example_fn() {
  printf "example\n"
}

 0 12:43:46 ~/demo
$ repl.sh
repl:[]  0 12:43:53 ~/demo
$ repl.watch ./test.sh
repl.watch: ./test.sh
repl.load: ./test.sh
repl:[./test.sh]  0 12:43:58 ~/demo
$ example_fn
example
repl:[./test.sh]  0 12:44:09 ~/demo
$ vim test.sh
repl.load: ./test.sh
repl:[./test.sh]  0 12:44:20 ~/demo
$ example_fn
the function has changed
repl:[./test.sh]  0 12:44:23 ~/demo
$ echo doing some else will not cause a reload
doing some else will not cause a reload
repl:[./test.sh]  0 12:44:36 ~/demo
$

REPL walkthrough

You can find the full script here.

Let's walk through it line by line:

Setting the stage

#!/usr/bin/env bash

[[ -v REPL_FILES ]] || declare -a REPL_FILES=()
[[ -v REPL_CHECKSUM ]] || declare REPL_CHECKSUM

We expect this script to be loaded with source, so that it can interact with the current shell environment. That means it can be loaded multiple times.

For tracking our state we need two variables: REPL_FILES is a list of files we want to watch and REPL_CHECKSUM is the last known checksum of all the files.

Using [[ -v we can check whether the variables are already defined (because repl.sh was loaded already), and only define them if necessary.

Hooking into Bash

repl.install() {
  local command
  declare -ga PROMPT_COMMAND
  for command in "${PROMPT_COMMAND[@]}"; do
    if [[ "$command" == "repl.load_if_changed" ]]; then
      return
    fi
  done

  PROMPT_COMMAND+=(repl.load_if_changed)
    repl.install_prompt
}

This is the entrypoint into the REPL functionality. Often overlooked, PROMPT_COMMAND can actually be an array instead of a plain string. This makes it easy for us to detect whether we've installed the REPL integration already or not.

First, we convert PROMPT_COMMAND into an array. The existing contents will just become the first element of the array. Then we see if repl.load_if_changed is already part of PROMPT_COMMAND, and if we don't find it in the list, we just add it to the end.

With that done, every time we hit Enter in Bash, Bash runs through all the commands in this list and executes them before presenting a prompt to the user again.

Additionally, we want to display some extra information in the prompt, to show that we are in REPL session. More on that later.

Detecting changes

repl.load_if_changed() {
  local filename new_checksum
  local -A old new
  new_checksum=$(repl.checksum)
  repl.parse_checksum old <<<"$REPL_CHECKSUM"
  repl.parse_checksum new <<<"$new_checksum"

  for filename in "${!new[@]}"; do
    if [[ "${new[$filename]}" != "${old[$filename]:-}" ]]; then
      repl.load "$filename"
    fi
  done

  REPL_CHECKSUM="$new_checksum"
}

To detect whether files have changed, we take a checksum of the current set of watched files and compare it to the last known checksum stored in REPL_CHECKSUM.

Under the hood we're using sha1sum for generating the checksums, so the value in new_checksum and REPL_CHECKSUM looks like this:

32b2b0c0a5ec7fe502f9ab319fbee761b38b7a48  repl.sh

We parse this text into two associative arrays (old and new) and then iterate over all the keys in new.

Note how we can obtain all keys in an array using ${!new[@]}.

If we detect a difference between checksums, we load the file for which the difference was detected.

Why iterate over new and not over old? The list of files on our watchlist can change, so if we iterated over old, we would not trigger a reload when a new file is added to the list.

After processing all list elements, we update REPL_CHECKSUM to reflect the current state of the world.

Parsing checksums

As you might have noticed, repl.parse_checksum somewhat magically populated the old and new arrays.

repl.parse_checksum() {
  local sha1 filename
  local -n destination_dict="$1"
  while read sha1 filename; do
    [[ -z "$filename" ]] && continue
    destination_dict["$filename"]="$sha1"
  done
}

The important bit here is local -n. This turns destination_dict into a nameref, so any operation on destination_dict is actually applied to the variable named by destination_dict's value.

Since the checksum data is tabular, we can use bash's builtin read to just grab the first and second column and build a map from filename to checksum.

Watching files

We still need a way to watch files for changes. This is what the repl.watch function does:

repl.watch() {
  local f r

  for f in "$@"; do
    for r in "${REPL_FILES[@]}"; do
      if [[ "$r" == "$f" ]]; then
        continue 2
      fi
    done

    REPL_FILES+=("$f")
    printf "repl.watch: %s\n" "$f" >&2
    repl.load "$f"
  done
}

We iterate over all the arguments, binding the current one to f. If we already find the file f in the list of watched files, we skip this iteration.

Bash solves breaking out of or continuing nested loop iterations elegantly by allowing you to specify the nesting level with break and continue. In this case continue 2 means continue to the next iteration of the outer loop.

If the file is not on our watchlist yet, we add it and print a message to stderr to alert the user to the fact this file is now watched.
Also, we load it because the user expressed their intent to work on this file.

Providing context

repl.install_prompt() {
  if [[ "$PS1" =~ repl: ]]; then
    return
  fi

  PS1="\[repl:\$(repl.prompt_info)\] $PS1"
}

repl.prompt_info() {
  local IFS=,
  printf '[%s]' "${REPL_FILES[*]}"
}

If a REPL shell looks like any other shell, it is easy to forget about files being automatically loaded.
We can improve the user experience by showing the user the files they are currently working with in the REPL.

This is basically what repl.prompt_info does: it formats the list of files in REPL_FILES by joining the all with a comma and wrapping the result in brackets.

The function repl.install_prompt then adds the output of repl.prompt_info to the prompt, but only if we haven't done so (assuming the user doesn't have repl: somewhere in there prompt already.

Entering a REPL

At the end of the repl.sh script we can see the following:

case ${0##*/} in
  repl.sh) repl.enter;;
  -bash) repl.install;;
esac

The ${0##*/} expands to the filename part of the path stored in $0. If repl.sh is invoked as a command, $0 will hold the full path to the file. If however repl.sh is loaded into an existing shell session using source, then $0 will be the string -bash.

We can use this to implement different behaviors: if loaded in an interactive shell, we just install the REPL and that's it.

If invoked as a command, we'll start a new interactive shell, with the REPL loaded and set up:

repl.enter() {
  bash --rcfile <(printf "%s\n" \
    "$(< $HOME/.bashrc)" \
    'source repl.sh' \
    'repl.install' \
  ) -i
}

We are using Bash's command substitution to create a new, temporary init-file for Bash that consists of the user's .bashrc followed by two commands to activate the REPL.


You'll only receive email when they publish something new.

More from Dario Hamidi
All posts