hn-classics/_stories/2008/11832941.md

9.6 KiB

created_at title url author points story_text comment_text num_comments story_id story_title story_url parent_id created_at_i _tags objectID year
2016-06-03T20:17:49.000Z Common shell script mistakes (2008) http://www.pixelbeat.org/programming/shell_script_mistakes.html pmoriarty 166 79 1464985069
story
author_pmoriarty
story_11832941
11832941 2008

Source

Common shell script mistakes

I've written a few shell scripts in my time and have read many more, and I see the same issues cropping up again and again (unfortunately even in my own scripts sometimes).

While there are lots of shell programming pitfalls, at least the interpreter will tell you immediately about them. The mistakes I describe below, generally mean that your script will run fine now, but if the data changes or you move your script to another system, then you may have problems.

I think some of the reason shell scripts tend to have lots of issues is that commonly one doesn't learn shell scripting like "traditional" programming languages. Instead scripts tend to evolve from existing interactive command line use, or are based on existing scripts which themselves have propagated the limitations of ancient shell script interpreters.

It's definitely worth spending the relatively small amount of time required to learn the shell script language correctly, if one uses linux/BSD/Mac OS X desktops or servers, where it is commonly used.

Inappropriate use

shell is the main domain specific language designed to manipulate the UNIX abstractions for data and logic, i.e. files and processes. So as well as being useful at the command line, its use permeates any UNIX system. Correspondingly, please be wary of writing scripts that deviate from these abstractions, and have significant data manipulation in the shell process itself. While flexible, shell is not designed as a general purpose language and becomes unwieldly when not leveraging the various UNIX tools effectively. A good knowlegde of the various UNIX tools goes hand in hand with effective shell programming.

Stylistic issues

First I'll mention some ways to clean up shell scripts without changing their functionality. Note I use a shortcut form of the conditional operator below (and in my shell scripts), when doing simple conditional operations, as it's much more concise. So I use [ "$var" = "find" ] && echo "found" instead of the equivalent:

if [ "$var" = "find" ]; then
  echo "found"
fi

[ x"$var" = x"find" ] && echo found

The use of x"$var" was required in case var is "" or "-hyphen". Thinking about this for a moment should indicate that the shell can handle both of these cases unambiguously, and if it doesn't it's a bug. This bug was probably fixed about 20 years ago, so stop propagating this nonsense please! Shell doesn't have the cleanest syntax to start with, so polluting it with stuff like this is horrible.

[ ! -z "$var" ] && echo "var not empty"

This is a double negative, and is very prevalent in shell scripts for some reason.
Just test the string directly like [ "$var" ] && echo "var not empty"

[ "$var" ] || var="value"

Setting a variable iff it's not previously set is a common idiom and can be more succinctly expressed like
${var="value"}. Note if you want to set a variable if it's empty or unset use : ${var**:**="value"}.
These are portable to the vast majority of shells.

[ "$var" ] && var="foo-$var" || var="foo"

Similarly to the previous case where we avoid explicit conditionals in shell logic, one can leverage conditional shell parameter expansion to handle the very common requirement of building up variant file names etc. like:

variant=bar
var="foo${variant:+-}$variant"

redundant use of $?

For example:

pidof program
if [ $? = 1 ]; then
  echo "program not found"
fi

Note this is not just stylistic actually. Consider what happens if pidof returns 2.
Instead just test the exit status of the process directly as in these examples:

if ! pidof program; then
  echo "program not found"
fi

if grep -qF "string" file; then
  echo 'file contains "string"'
fi

Be careful though when checking negative returns, as you generally get a negative return for any failure, like I/O error etc. For example if using grep to check there is no match, then using $? is not redundant:

grep -q 'regex' FILE; local st=$?
if [ $st = 1 ]; then
  echo no-match
fi

needless shell logic

We'll expand on this below, but we should do as little in shell as possible, over its domain of connecting process to files. For example the following common shell idiom of testing for files and directories can often be pushed into the programs themselves. I.E. instead of:

[ ! -d "$dir" ] && mkdir "$dir"
[ -f "$file" ] && rm "$file"

do:

mkdir -p "$dir" #also creates a hierarchy for you
rm -f "$file" #also never prompts

Note also Google's shell style guide which as per other google style guides has very sensible advice.

Robustness

Aaron Maxwell wrote up a good summary of settings and consequences for an unofficial strict mode for bash which is worth considering for your bash scripts at least. Here I discuss more general techniques appropriate for most shell scripts.

globbing

In the example below to count the lines in each file, there is a common mistake.

for file in `ls *`; do
  wc -l $file
done

Perhaps the idiom above stems from a common system where the shell does not do globbing, but in any case it's neither scalable or robust. It's not robust because it doesn't handle spaces in file names as word splitting is done. Also it redundantly starts an ls process to list the files. Also on some systems this form can overflow static command line buffers when there are many files. Shell script is a language designed to operate on files so it has this functionality built in!

for file in *; do
  wc -l -- "$file"
done

Notice how we just use the '*' directly which as well as not starting the redundant ls process, doesn't do word splitting on file names containing spaces. Also notice the added '--' option, to indicate to wc to stop option processing and thus be immune to file names starting with '-'. Note this still is slow, as we use shell looping and start a wc process per file, so we'll come back to this example in the performance section below.

quoting

Shell quoting is a complicated area whose subtleties are often overlooked, and this is compounded when combined with the fact that file names can contain almost any character. The following quoting guidelines come from David A. Wheeler's excellent article on Filenames and Pathnames in Shell, which is worth reading in its entirety:

  1. Double-quote all variable references and command substitutions unless you are certain they can only contain alphanumeric characters or you have specially prepared things (i.e., use "$variable" instead of $variable). In particular, you should practically always put $@ inside double-quotes; POSIX defines this to be special (it expands into the positional parameters as separate fields even though it is inside double-quotes).
  2. Set IFS to just newline and tab, if you can, to reduce the risk of mishandling filenames with spaces. Use newline or tab to separate options stored in a single variable. Set IFS with IFS="$(printf 'nt')"
  3. Prefix all pathname globs so they cannot expand to begin with "-". In particular, never start a glob with "?" or "" (such as ".pdf"); always prepend globs with something (like "./") that cannot expand to a dash. So never use a pattern like ".pdf"; use "./.pdf" instead.
  4. Check if a pathname begins with "-" when accepting pathnames, and then prepend "./" if it does.
  5. Be careful about displaying or storing pathnames, since they can include newlines, tabs, terminal control escape sequences, non-UTF-8 characters (or characters not in your locale), and so on. You can strip out control characters and non-UTF-8 characters before display using printf '%s' "$file" | LC_ALL=POSIX tr -d '[:cntrl:]' | iconv -cs -f UTF-8 -t UTF-8
  6. Do not depend on always using "--" between options and pathnames as the primary countermeasure against filenames beginning with "-". You have to do it with every command for this to work, but people will not use it consistently (they never have), and many programs (including echo) do not support "--". Feel free to use "--" between options and pathnames, but only as an additional optional protective measure.
  7. Use a template that is known to work correctly (see paper).
  8. Use a tool like shellcheck to find problems you missed.

Related to this is recently POSIX added support for $'...' quoting format to easily and unambiguously specify any string, and the GNU ls(1) command since coreutils v8.25 takes advantage of that to display file names in an unambiguous and safe to paste back manner. Consider for example if a bad actor placed a file called **$'