opodze.com/me

Things I keep forgetting about in: bash/make/coreutils

This is a continuation of a series of posts, but now on bash. See what is this about and why I’m writing this in the first post

bash

bash is probably nobody’s favourite programming language, and complaining about how it’s bad is more like beating a dead horse at the moment. However, we all have to live with it for a foreseeable future, especially in cases for connecting to remote environments. Unfortunately, some people still believe that writing a bash scripts that are longer than 50 lines is a good idea, so it’s not possible to just ignore its’ existence for good.

I want to cover things that usually get me, as well as some syntax weirdness that I have to look up almost every time I need to write something new in bash for some reason.

loops

The most basic loop in bash is somewhat simple, but as always, a bit weird:

for i in "a b c d e"; do
    echo $i
done

Often this is pretty much enough. You can use output of commands for loops like this:

for i in *.md; do
    echo $(basename $i)
done

It is a very bad idea to use output of ls and find in any form for this purpose, see this post for why.

There’s also while loop with an expected syntax:

while true; do
    echo "boop"
done

There are also some magic commands that are often used with loops like read, shift and such. Out of scope for our purpose, we need to focus!

but what if I need to find files recursively?

Use find and avoid for altogether:

find . -type f -name '*.md' -exec basename {} \;

variable expansion

The simplest (and usually the worst) way to use value of a variable is to put $ in front of it like $var. This can go wrong, or be not something that you want for a variety of reasons. The shortlist of things to use instead:

  • ${var} to tell exactly where name of the variable ends
  • "${var}": expansions should be double-quoted to avoid word-splitting
  • "${var#*/}" to remove items from the string up to and including the first /: thing after # is a pattern - ${var##*/} to make search greedy
  • "${var%*.}" to remove items from the string after and including the last .
    • ${var%%.} to make the search greedy: will look for the first occurrence.
  • "${var/item/diff}" to replace item with diff once.
    • "${var//item/diff}" to replace every item with diff.
    • This also works for "${var/#*:/file:}", ${var/%./,} to look for patterns instead of exact strings
  • "${#var}" to expand to byte length of the value of the variable
  • ${var:8:2} to get substring of the variable starting at index 8, of length 2 (length is optional)
  • "${var=default}" to provide a default value for a variable
  • "${arr[@]}" passes arr as array without word splitting in elements (${@} for all input parameters)
  • "${arr[*]}" passes arr as one string, by joining all elements with $IFS as separator (${*} for all input parameters)

substitutions

bash is all about manipulating strings and streams of strings. There are five(ish) main ways of passing data in, out, and between programs. Five is probably 3 or 4 more than I would expect to have, so the naming of these approaches can sound like there is some overlap (and there is).

  1. command substitution, done via $( .... ) syntax. This just runs the command in a subshell and places its’ outputs in place of other command.
  2. process substitution, done via >( ... ) or <( ... ) syntax. This also runs a command in a subshell, but with key differences:
    • the command is treated as a file: > is a write-only file, < is a read-only file.
    • if the total command writes to this write-only file, the subprocess receives this data in the stdin. Similarly, the other direction of a pipe creates a file with contents of stdout output of the command.
    • <(..) can be used as an ad-hoc solution for commands that do not support reading from stdin and want to only read from files
    • overall return code of the command is not determined by the subprocess at all: a <(b) will result in whatever status code of a is
  3. somewhat similar, but also totally different is piping: |. This does something really akin to >( ... ), but it does not create a temporary file
  4. on a similar note of manipulating stdin of programs, there are <<, <<-, and <<< operators. I’m just going to say that these are called heredocs and herestrings and they are kind of like <( ... ) substitution.
    • Not to be confused with > or >>, which is redirecting output to a file… Oh god.

Relevant snippets:

# assign value to output of the command
today=$(date)

# save a copy of stdout logs to a log file, but keep
# the return status of run-my-job command
run-my-job >(tee -a log-file.log)

# redirect stdout and stderr to filename
run-a-job &> filename

# redirect stderr to stdout, then pipe it to a file and
# keep the console output in the stdout
run-a-job 2>&1 | tee -a some-file

# diff two sorted files without saving sorted contents individually
diff <(sort file1) <(sort file2)

# combine heredoc with jq to compact a json
# notice the weird interaction of herestring with pipe
cat <<EOF | jq -c '.'
{
  "foo": "bar",
  "baz": [ "badly",
    "formatted",
        "json"]
}
EOF

gotchas

  • piping commands a | b eats the return value of a by default: if you rely somehow on return status of a, it needs to be worked around somehow (but please, just use another language at that point instead)
  • while internet tends to recommend set -euo pipefail by default, it has a ton of weird side-effects ( link 1, link 2). The basics are that:
    • -e has a ton of weird edge cases where it ignores some non-zero error codes, for example in if statements, && chains and so on
    • pipefail does not allow you to inspect which specific part of the pipeline failed; additionally it’s possible for a downstream command to not consume all of the data from upstream command, and this can lead to a pipe fail - even if that’s an expected behavior.

quotes

  • '...' is akin to raw strings in normal programming languages. Only other single quotes have to be escaped in these strings, everything else is just used literally
  • "..." is what should be used in most cases: it allows variable substitutions $ and does not do word splitting
  • `...` is command substitution in legacy format and should be replaced with $()

conditions

Some examples of conditions

# whitespaces are important
[[ "$foo" == 5 ]] && echo "true"
[ "$foo" = 5 ] && echo "true"
if [ "$foo" = 5 ]; then
    echo "true"
else
    echo "false"
fi

[[ and [ are really similar. For simple equality checks they seem to be identical, and both support:

  • [[ -e file ]] - check if file exists
  • [[ -f/d/h file ]] - check if path is a file/directory/symbolic link
  • [[ -z/n $foo ]] - check if string is empty/non-empty
  • = != < > operators for string comparison
  • -eq -ne -gt -lt -ge -le for integer comparison

Only [[ is capable of:

  • = == != - comparing strings by pattern (for example, $foo = *.jpg)
  • =~ comparing by regex match
  • () for sub-expressions
  • && || for combining expressions

The TLDR of choosing between these two is to always use [[, if possible ( for example, if there’s no need to run in sh).

There is also bash-exclusive feature (( (called bash arithmetic) for integer operations, but again, I really prefer not to use bash for math, ugh.

In zsh, to see the man pages for [[ and ((, we need to use a little hack, since 1) these are keywords, not commands, and 2) by default run-help is an alias for man for some reason:

unalias run-help
autoload run-help
run-help '[['
run-help let # "base name" of ((

case statements

So far, 100% of the places where I’ve seen this, case was a sign that the script does too much. Often this is used for argument parsing. The basic syntax for this is:

case $var in
    -v|--verbose) verbose=T ;;
    *) echo "unknown option $var" ;;
esac

I have a strong suspicion that the ugliness of this statement is mainly caused by all all of the syntax shortcuts bash allows. Pretty much everything about case in bash upsets me, so let’s end this section this as soon as possible.

makefile

While Makefile is technically a different language, it’s not worth it to put it into a separate article. I tend to use Makefiles sparingly, but inevitably, you will need to do something smarter than just sequencing 2-3 commands in a target, so this is important.

variable definition

This is pretty simple, but subtly different from how it’s done in bash:

# define a variable. allow recursive definitions, lazy evaluation
var=value
# define a variable, expand value eagerly, disallow recursive definitions
var:=${value}
# define a variable if not defined already
var?=value

makefile: variable expansion

Every command in Makefile runs in a new sub-shell. Variables in the Makefile environment thus are different from variables in the shell environment:

all:
 # same as running "echo $VENV" in a new shell
 echo $$VENV
 # will expand the "VENV" command in make, and
 # run "echo ./venv" in a new shell
 echo $(VENV)

script variables

  • $# - Amount of positional arguments
  • $@ - All positional parameters
  • $* - All positional parameters as a single string
  • $? - Exit status of last command
  • $_ Last argument of previous command

branching

all:
    ifdef dry_run
        echo "running in dry-run mode"
    else
        echo "actually running a command"
    endif

    ifndef dry_run
        echo "running a non-dry-run"
    endif

Similar to bash, once I need something more complex than the above in a Makefile, it might be time to put it in a script somewhere

keyboard shortcuts

Surprisingly unintuitive and different for a bunch of shells, the basics are still worth remembering (so easy and intuitive!):

  • ctrl-f/b to move forward/back character
  • option-f/b to move forward/back word
  • ctrl-e to jump to end of the line
  • ctrl-a to jump to a beginning of the line
  • ctrl-u/k to cut before/after ckursor
  • ctrl-w/option-d to delete word before/after the cursor

So simple! now, some zsh-specific tricks:

  • ctrl-xe to edit current command in $EDITOR
  • fc to open last command in editor
  • r to repeat last command. Can also slightly edit the command like r foo=bar

coreutils

Too often it seems simple enough to just pop up a python interpreter and do a clumsy reimplementation of something that you already have in your POSIX-ish system.

actual core-utils

  • csplit/split to cut file into sections based on text or size, respectively
  • cut to split tabulated output and extracting specific column values:
    • echo "aaaa,bbbbb,c,dd,ee,f" | cut -d "," -f 3
    • echo "aaaa\tbbbbb\tc\tdd\tee\tf" | cut -d $'\t' -f 3
  • expand to convert tabs to spaces. unexpand for the reverse
  • nl to print file with line number. Useful as combo with less
  • tr for input translation
    • echo "abcdef" | tr 'abcdef' 'zyxvw'
  • seq to print sequence of numbers
  • shuf <file> for shuffling
  • wc for word and non-word counts in a file:
    • -l to count lines
    • -w for words
    • -c for characters (bytes)
    • -m for characters, accounting for unicode shenanigans
  • date has a bunch of format options like:
    • date +%s for unix timestamp in seconds
    • date -r <timestamp> to parse timestamp and print a date

technically-not-but-pretty-much-core-utils

awk

So complex that might require a separate article. I tend to not use it because I always forget the details of how it works

sed

For substitutions. Often it gets too unwieldy too quickly so I just go straight to writing custom scripts without trying to use sed at all. Or use text editor to make changes interactively.

sed -i 's/searchstring/replacestring/g' myfile

At the very least, however, the syntax of replacement is worth learning, since it pops up in a lot of places, including vim.

xargs

Has the weirdest syntax. The simplest case is intuitive:

find . -name '\*.toml' -depth 1 -print0  | xargs -0 wc

Note that -0 is pretty much mandatory for xargs (and for all the commands that input to it). Otherwise, it will split items by space and everything will be borked.

Other usual cases are:

  • Call command with one item at a time: | xargs -n 1 wc
  • Put the arguments into a specific place: | xargs -I {} bash -c "echo 'aaa' {} 'bbb'"

find

One of the most useful commands, has some weird syntax in its' more useful features.

The simplest case of xargs usage often can be worked around by using

# exec on one file at a time
find . -name '*.toml' -exec wc {} \;
# exec on all files at once
find . -name '*.toml' -exec wc {} +

The options for this command are also unusual:

  • -depth instead of --depth, same with most parameters with long names. find calls most of the filters "primaries" and justifies this weirdness as a feature. The idea is that before a primary you can write +- and it will mean more or less than X
  • -type f/l/d to look up files/links/directories
  • -name, -iname, -lname, -lname to look for names via pattern. Prefixes change case sensitivity and if links should be followed.
  • -path to match full pathname instead of file name
  • -E to use normal regexes instead of simplified pattern matching syntax
  • primaries can be combined using (), !, -not, -and, -or operators, they have the same priority as written

From what I understand, primaries are evaluated left to right until one true is found. This can be unintuitive for cases like:

find . -print0 -depth 1

The trick here is that -print0 is a “truthy” condition, and because of this, -depth does not get evaulated. This will return list of all files

rsync

Is the epitomy of weird options and behaviors. I pretty much always go to my personal cheat sheet to look up which options to use. Here are my personal “favourites”

  1. Syncing local files (like a back up to an external drive):

    rsync -armuv --progress "docs/photos/" "/Volumes/drive/docs/photos/"
    
    • -a for “archive” mode. It means to copy special files (like sockets and device files), groups, links, owner tags, permissions, modification timestamps. In 100% of use cases for rsync, this is what I want to do.
    • -r for recursive
    • -m for “prune eMpty dirs”
    • -u for “update”: only copy files that are newer in source than in destination
    • -v to see what the hell is going on.
    • --progress for printing progress on copying (useful for large files)

    Note the / at the ends of paths. they are super important. Mixing up and forgetting trailing slashes can result in messing up:

    rsync -armuv dir1 dir2 # will create dir2/dir1/*contents*
    rsync -armuv dir1/ dir2 # will copy contents of dir1/ to dir2 directly
    rsync -armuv dir1/ dir2/ # same as previous example
    
  2. Copying to a remote host. All of the basic rules are the same, but, it might make sense to add -zP to the long list of parameters. This enables compressing data on transfer and partial uploads respectively.

conclusion (ok maybe I do need to complain about bash for a little bit)

Unlike tmux, there’s a lot I’ve had to say about sh/bash/zsh. My brain outright refuses to remember most of these things fully - just some notion of “hey there was a way to do this, right?…”. The simplest reason why is that bash is more than one tool, it’s more than a language, since it permeates neighboring tools (like make), and transparently uses all of your other tools, not only core utils. Decades of attempts to extend the language, but keeping a vague promise of backward-compatibility for about 80% of features means that every little language feature is riddled with gotchas, syntax that’s way too concise for its’ own good, and tons of unique things that are named in a way where you can’t just google them because they are just symbols that search engines tend to ignore lol.

You can “just read the manual”, but I’m hesitant to say that knowing bash perfectly is not deeply traumatising for a mere mortal.

Adding to the complexity mess is the zoo of different implementations. I’m not of the people who really care about making everything POSIX-compliant, but even working with a default set of operating systems like Debian/macOS/RHEL, you encounter some of the not-so-standard features, and for that reason I’m trying to at least not rely on zsh magic too much in my scripts. Bashisms are quite enough for everyone, even though the whole language family is…special.

footnotes

If I were to write all of the weird behaviors and how to avoid them, I’d might as well just redirect you to:

Getting a more comprehensive picture of how things work and what to look for is much easier after you figure out the names of all the weird syntax features and basic understanding of how it’s supposed to work. Maybe not worth remembering everything there, but a glance over the gotchas is definitely worth it.

For shell commands, can’t recommend enough tldr with a non-ruby client like tealdeer or tlrc. I’m using it at least once or twice a week to look up the default usage patterns for commands that I need once in a blue moon, like remembering how exactly to operate pacman beyond pacman -Syu