Bash: using regular expressions

Working with strings

Bash comes in with a few built-in features during parameter expansion to work with strings:

Generic find and replace: ${var//source/dest} replaces all occurrences of source in $var with dest

Removing prefixes and suffixes: ${var##prefix} removes the longest prefix from $var, where prefix can be any Bash pattern
(replace ## with %% to remove a suffix instead). This is useful for working with paths:

p=/usr/local/lib/libexample.so
printf "filename: %s\n" "${p##*/}"  # prints "filename: libexample.so"
printf "directory: %s\n" "${p%%/*}" # prints "directory: /usr/local/lib"

But sometimes this is not enough.

Using regular expressions

Luckily Bash has you covered!

Bash supports POSIX extended regular expressions (probably familiar already to you from grep -E). Here's how the parts fit together:

  • the builtin [[ can be used to perform a regular expression match
  • the BASH_REMATCH array holds the match info (the full match plus any capture groups)

Let's use this to trim whitespace from the beginning and end of a string:

trim() {
  local text="$1"

  # perform the match
  [[ "$text" =~ ^[[:space:]]*(.+)[[:space:]]*$ ]]

  # print result of the first capture group
  printf "%s\n" "${BASH_REMATCH[1]}"
}

trim $(printf "  \n hello\n\n")
# hello

The important bit here is that [[ only treats the righthand side of =~ as a regular expression if it is not quoted.

If you do need to quote a lot of special characters, the recommended way is to store the pattern in a variable and substitute it unquoted:

# pattern for parsing log lines like this: [INFO] [2021-10-14T14:50:00Z] Important message
log_pattern='\[([^]]+)\] \[([^]]+)\] (.*)'
[[ "[INFO] [2021-10-14T14:50:00Z] Important message" =~ $log_pattern ]]
declare -p BASH_REMATCH
# [1]="INFO"
# [2]="2021-10-14T14:50:00Z"
# [3]="Important message"

You'll only receive email when they publish something new.

More from Dario Hamidi
All posts