Bash: using regular expressions
October 14, 2021•281 words
Working with strings
Bash comes in with a few built-in features during parameter expansion to work with strings:
Generic find and replace: ${var//source/dest}
replaces all occurrences of source
in $var
with dest
Removing prefixes and suffixes: ${var##prefix}
removes the longest prefix
from $var
, where prefix
can be any Bash pattern
(replace ##
with %%
to remove a suffix instead). This is useful for working with paths:
p=/usr/local/lib/libexample.so
printf "filename: %s\n" "${p##*/}" # prints "filename: libexample.so"
printf "directory: %s\n" "${p%%/*}" # prints "directory: /usr/local/lib"
But sometimes this is not enough.
Using regular expressions
Luckily Bash has you covered!
Bash supports POSIX extended regular expressions (probably familiar already to you from grep -E
). Here's how the parts fit together:
- the builtin
[[
can be used to perform a regular expression match - the
BASH_REMATCH
array holds the match info (the full match plus any capture groups)
Let's use this to trim whitespace from the beginning and end of a string:
trim() {
local text="$1"
# perform the match
[[ "$text" =~ ^[[:space:]]*(.+)[[:space:]]*$ ]]
# print result of the first capture group
printf "%s\n" "${BASH_REMATCH[1]}"
}
trim $(printf " \n hello\n\n")
# hello
The important bit here is that [[
only treats the righthand side of =~
as a regular expression if it is not quoted.
If you do need to quote a lot of special characters, the recommended way is to store the pattern in a variable and substitute it unquoted:
# pattern for parsing log lines like this: [INFO] [2021-10-14T14:50:00Z] Important message
log_pattern='\[([^]]+)\] \[([^]]+)\] (.*)'
[[ "[INFO] [2021-10-14T14:50:00Z] Important message" =~ $log_pattern ]]
declare -p BASH_REMATCH
# [1]="INFO"
# [2]="2021-10-14T14:50:00Z"
# [3]="Important message"