Things I keep forgetting about in: bash/make/coreutils
This is a continuation of a series of posts, but now on bash. See what is this about and why I’m writing this in the first post
bash
bash is probably nobody’s favourite programming language, and complaining
about how it’s bad is more like beating a dead horse at the moment. However, we all
have to live with it for a foreseeable future, especially in cases for
connecting to remote environments. Unfortunately, some people still believe
that writing a bash scripts that are longer than 50 lines is a good idea,
so it’s not possible to just ignore its’ existence for good.
I want to cover things that usually get me, as well as some syntax weirdness that
I have to look up almost every time I need to write something new in
bash for some reason.
loops
The most basic loop in bash is somewhat simple, but as always, a bit weird:
for i in "a b c d e"; do
echo $i
done
Often this is pretty much enough. You can use output of commands for loops like this:
for i in *.md; do
echo $(basename $i)
done
It is a very bad idea to use output of ls and find in any form for this purpose,
see this post for why.
There’s also while loop with an expected syntax:
while true; do
echo "boop"
done
There are also some magic commands that are often used with loops like
read, shift and such. Out of scope for our purpose, we need to focus!
but what if I need to find files recursively?
Use find and avoid for altogether:
find . -type f -name '*.md' -exec basename {} \;
variable expansion
The simplest (and usually the worst) way to use value of a variable is to put $
in front of it like $var. This can go wrong, or be not something that you want
for a variety of reasons. The shortlist of things to use instead:
${var}to tell exactly where name of the variable ends"${var}": expansions should be double-quoted to avoid word-splitting"${var#*/}"to remove items from the string up to and including the first/: thing after#is a pattern -${var##*/}to make search greedy"${var%*.}"to remove items from the string after and including the last.${var%%.}to make the search greedy: will look for the first occurrence.
"${var/item/diff}"to replaceitemwithdiffonce."${var//item/diff}"to replace everyitemwithdiff.- This also works for
"${var/#*:/file:}",${var/%./,}to look for patterns instead of exact strings
"${#var}"to expand to byte length of the value of the variable${var:8:2}to get substring of the variable starting at index 8, of length 2 (length is optional)"${var=default}"to provide a default value for a variable"${arr[@]}"passesarras array without word splitting in elements (${@}for all input parameters)"${arr[*]}"passesarras one string, by joining all elements with$IFSas separator (${*}for all input parameters)
substitutions
bash is all about manipulating strings and streams of strings. There are five(ish) main ways of passing data in, out, and between programs. Five is probably 3 or 4 more than I would expect to have, so the naming of these approaches can sound like there is some overlap (and there is).
- command substitution, done via
$( .... )syntax. This just runs the command in a subshell and places its’ outputs in place of other command. - process substitution, done via
>( ... )or<( ... )syntax. This also runs a command in a subshell, but with key differences:- the command is treated as a file:
>is a write-only file,<is a read-only file. - if the total command writes to this write-only file, the subprocess
receives this data in the
stdin. Similarly, the other direction of a pipe creates a file with contents ofstdoutoutput of the command. <(..)can be used as an ad-hoc solution for commands that do not support reading fromstdinand want to only read from files- overall return code of the command is not determined by the subprocess at all:
a <(b)will result in whatever status code ofais
- the command is treated as a file:
- somewhat similar, but also totally different is piping:
|. This does something really akin to>( ... ), but it does not create a temporary file - on a similar note of manipulating stdin of programs, there are
<<,<<-, and<<<operators. I’m just going to say that these are called heredocs and herestrings and they are kind of like<( ... )substitution.- Not to be confused with
>or>>, which is redirecting output to a file… Oh god.
- Not to be confused with
Relevant snippets:
# assign value to output of the command
today=$(date)
# save a copy of stdout logs to a log file, but keep
# the return status of run-my-job command
run-my-job >(tee -a log-file.log)
# redirect stdout and stderr to filename
run-a-job &> filename
# redirect stderr to stdout, then pipe it to a file and
# keep the console output in the stdout
run-a-job 2>&1 | tee -a some-file
# diff two sorted files without saving sorted contents individually
diff <(sort file1) <(sort file2)
# combine heredoc with jq to compact a json
# notice the weird interaction of herestring with pipe
cat <<EOF | jq -c '.'
{
"foo": "bar",
"baz": [ "badly",
"formatted",
"json"]
}
EOF
gotchas
- piping commands
a | beats the return value ofaby default: if you rely somehow on return status ofa, it needs to be worked around somehow (but please, just use another language at that point instead) - while internet tends to recommend
set -euo pipefailby default, it has a ton of weird side-effects ( link 1, link 2). The basics are that:-ehas a ton of weird edge cases where it ignores some non-zero error codes, for example inifstatements,&&chains and so onpipefaildoes not allow you to inspect which specific part of the pipeline failed; additionally it’s possible for a downstream command to not consume all of the data from upstream command, and this can lead to a pipe fail - even if that’s an expected behavior.
quotes
'...'is akin to raw strings in normal programming languages. Only other single quotes have to be escaped in these strings, everything else is just used literally"..."is what should be used in most cases: it allows variable substitutions$and does not do word splitting`...`is command substitution in legacy format and should be replaced with$()
conditions
Some examples of conditions
# whitespaces are important
[[ "$foo" == 5 ]] && echo "true"
[ "$foo" = 5 ] && echo "true"
if [ "$foo" = 5 ]; then
echo "true"
else
echo "false"
fi
[[ and [ are really similar. For simple equality checks they seem to
be identical, and both support:
[[ -e file ]]- check if file exists[[ -f/d/h file ]]- check if path is a file/directory/symbolic link[[ -z/n $foo ]]- check if string is empty/non-empty= != < >operators for string comparison-eq -ne -gt -lt -ge -lefor integer comparison
Only [[ is capable of:
= == !=- comparing strings by pattern (for example,$foo = *.jpg)=~comparing by regex match()for sub-expressions&& ||for combining expressions
The TLDR of choosing between these two is to always use [[, if possible (
for example, if there’s no need to run in sh).
There is also bash-exclusive feature (( (called bash arithmetic) for integer
operations, but again, I really prefer not to use bash for math, ugh.
In zsh, to see the man pages for [[ and ((, we need to use a little hack,
since 1) these are keywords, not commands, and 2) by default run-help is
an alias for man for some reason:
unalias run-help
autoload run-help
run-help '[['
run-help let # "base name" of ((
case statements
So far, 100% of the places where I’ve seen this, case was a sign that the
script does too much. Often this is used for argument parsing. The basic
syntax for this is:
case $var in
-v|--verbose) verbose=T ;;
*) echo "unknown option $var" ;;
esac
I have a strong suspicion that the ugliness of this statement is mainly
caused by all all of the syntax shortcuts bash allows. Pretty much everything
about case in bash upsets me, so let’s end this section this as soon
as possible.
makefile
While Makefile is technically a different language, it’s not worth it to put it into a separate article. I tend to use Makefiles sparingly, but inevitably, you will need to do something smarter than just sequencing 2-3 commands in a target, so this is important.
variable definition
This is pretty simple, but subtly different from how it’s done in bash:
# define a variable. allow recursive definitions, lazy evaluation
var=value
# define a variable, expand value eagerly, disallow recursive definitions
var:=${value}
# define a variable if not defined already
var?=value
makefile: variable expansion
Every command in Makefile runs in a new sub-shell. Variables in the Makefile environment thus are different from variables in the shell environment:
all:
# same as running "echo $VENV" in a new shell
echo $$VENV
# will expand the "VENV" command in make, and
# run "echo ./venv" in a new shell
echo $(VENV)
script variables
$#- Amount of positional arguments$@- All positional parameters$*- All positional parameters as a single string$?- Exit status of last command$_Last argument of previous command
branching
all:
ifdef dry_run
echo "running in dry-run mode"
else
echo "actually running a command"
endif
ifndef dry_run
echo "running a non-dry-run"
endif
Similar to bash, once I need something more complex than the above
in a Makefile, it might be time to put it in a script somewhere
keyboard shortcuts
Surprisingly unintuitive and different for a bunch of shells, the basics are still worth remembering (so easy and intuitive!):
ctrl-f/bto moveforward/back characteroption-f/bto moveforward/back wordctrl-eto jump toend of the linectrl-ato jump toabeginning of the linectrl-u/kto cut before/afterckursorctrl-w/option-dtodeleteword before/after the cursor
So simple! now, some zsh-specific tricks:
ctrl-xeto edit current command in$EDITORfcto open last command in editorrto repeat last command. Can also slightly edit the command liker foo=bar
coreutils
Too often it seems simple enough to just pop up a python interpreter and do a clumsy reimplementation of something that you already have in your POSIX-ish system.
actual core-utils
csplit/splitto cut file into sections based on text or size, respectivelycutto split tabulated output and extracting specific column values:echo "aaaa,bbbbb,c,dd,ee,f" | cut -d "," -f 3echo "aaaa\tbbbbb\tc\tdd\tee\tf" | cut -d $'\t' -f 3
expandto convert tabs to spaces.unexpandfor the reversenlto print file with line number. Useful as combo withlesstrfor input translationecho "abcdef" | tr 'abcdef' 'zyxvw'
seqto print sequence of numbersshuf <file>for shufflingwcfor word and non-word counts in a file:-lto count lines-wfor words-cfor characters (bytes)-mfor characters, accounting for unicode shenanigans
datehas a bunch of format options like:date +%sfor unix timestamp in secondsdate -r <timestamp>to parse timestamp and print a date
technically-not-but-pretty-much-core-utils
awk
So complex that might require a separate article. I tend to not use it because I always forget the details of how it works
sed
For substitutions. Often it gets too unwieldy too quickly so
I just go straight to writing custom scripts without trying to use sed at all.
Or use text editor to make changes interactively.
sed -i 's/searchstring/replacestring/g' myfile
At the very least, however, the syntax of replacement is worth learning, since it pops up in a lot of places, including vim.
xargs
Has the weirdest syntax. The simplest case is intuitive:
find . -name '\*.toml' -depth 1 -print0 | xargs -0 wc
Note that -0 is pretty much mandatory for xargs (and for all
the commands that input to it). Otherwise, it will split items
by space and everything will be borked.
Other usual cases are:
- Call command with one item at a time:
| xargs -n 1 wc - Put the arguments into a specific place:
| xargs -I {} bash -c "echo 'aaa' {} 'bbb'"
find
One of the most useful commands, has some weird syntax in its' more useful features.
The simplest case of xargs usage often can be worked around by using
# exec on one file at a time
find . -name '*.toml' -exec wc {} \;
# exec on all files at once
find . -name '*.toml' -exec wc {} +
The options for this command are also unusual:
-depthinstead of--depth, same with most parameters with long names.findcalls most of the filters"primaries"and justifies this weirdness as a feature. The idea is that before a primary you can write+-and it will mean more or less than X-type f/l/dto look up files/links/directories-name,-iname,-lname,-lnameto look for names via pattern. Prefixes change case sensitivity and if links should be followed.-pathto match full pathname instead of file name-Eto use normal regexes instead of simplified pattern matching syntax- primaries can be combined using
(),!,-not,-and,-oroperators, they have the same priority as written
From what I understand, primaries are evaluated left to right until one true is found. This can be unintuitive for cases like:
find . -print0 -depth 1
The trick here is that -print0 is a “truthy” condition, and because of this,
-depth does not get evaulated. This will return list of all files
rsync
Is the epitomy of weird options and behaviors. I pretty much always go to my personal cheat sheet to look up which options to use. Here are my personal “favourites”
-
Syncing local files (like a back up to an external drive):
rsync -armuv --progress "docs/photos/" "/Volumes/drive/docs/photos/"-afor “archive” mode. It means to copy special files (like sockets and device files), groups, links, owner tags, permissions, modification timestamps. In 100% of use cases forrsync, this is what I want to do.-rfor recursive-mfor “prune eMpty dirs”-ufor “update”: only copy files that are newer in source than in destination-vto see what the hell is going on.--progressfor printing progress on copying (useful for large files)
Note the
/at the ends of paths. they are super important. Mixing up and forgetting trailing slashes can result in messing up:rsync -armuv dir1 dir2 # will create dir2/dir1/*contents* rsync -armuv dir1/ dir2 # will copy contents of dir1/ to dir2 directly rsync -armuv dir1/ dir2/ # same as previous example -
Copying to a remote host. All of the basic rules are the same, but, it might make sense to add
-zPto the long list of parameters. This enables compressing data on transfer and partial uploads respectively.
conclusion (ok maybe I do need to complain about bash for a little bit)
Unlike tmux, there’s a lot I’ve had to say about sh/bash/zsh. My brain
outright refuses to remember most of these things fully - just some
notion of “hey there was a way to do this, right?…”. The simplest
reason why is that bash is more than one tool, it’s more than a language,
since it permeates neighboring tools (like make), and transparently
uses all of your other tools, not only core utils. Decades of attempts
to extend the language, but keeping a vague promise of backward-compatibility
for about 80% of features means that every little language feature is
riddled with gotchas, syntax that’s way too concise for its’ own good,
and tons of unique things that are named in a way where you can’t just
google them because they are just symbols that search engines tend to ignore lol.
You can “just read the manual”, but I’m hesitant to say that knowing bash perfectly is not deeply traumatising for a mere mortal.
Adding to the complexity mess is the zoo of different implementations.
I’m not of the people who really care about making everything POSIX-compliant,
but even working with a default set of operating systems like Debian/macOS/RHEL,
you encounter some of the not-so-standard features, and for that reason I’m
trying to at least not rely on zsh magic too much in my scripts. Bashisms are
quite enough for everyone, even though the whole language family is…special.
footnotes
If I were to write all of the weird behaviors and how to avoid them, I’d might as well just redirect you to:
- https://mywiki.wooledge.org/BashGuide
- https://guide.bash.academy/
- https://makefiletutorial.com/
- https://www.oilshell.org
Getting a more comprehensive picture of how things work and what to look for is much easier after you figure out the names of all the weird syntax features and basic understanding of how it’s supposed to work. Maybe not worth remembering everything there, but a glance over the gotchas is definitely worth it.
For shell commands, can’t recommend enough tldr
with a non-ruby client like tealdeer
or tlrc. I’m using it at
least once or twice a week to look up the default usage
patterns for commands that I need once in a blue moon, like remembering
how exactly to operate pacman beyond pacman -Syu