hn-classics/_stories/2010/9021786.md

5.1 KiB
Raw Permalink Blame History

created_at title url author points story_text comment_text num_comments story_id story_title story_url parent_id created_at_i _tags objectID year
2015-02-09T15:57:30.000Z Filenames and Pathnames in Shell: How to Do It Correctly (2010) http://www.dwheeler.com/essays/filenames-in-shell.html thefox 85 20 1423497450
story
author_thefox
story_9021786
9021786 2010

Source

Filenames and Pathnames in Shell (bash, dash, ash, ksh, and so on): How to do it Correctly

Filenames and Pathnames in Shell: How to do it Correctly

David A. Wheeler

2016-05-04 (original version 2010-05-19)

Many Bourne shell scripts (as run by bash, dash, ash, ksh, and so on) do not handle filenames and pathnames correctly on Unix-like/POSIX systems. Some shell programming books teach it wrongly, and even the POSIX standard sometimes gets it wrong. Thus, many shell scripts are buggy, leading to surprising failures and in some cases security vulnerabilities (see the “Secure Programming for Linux and Unix HOWTO” section on filenames, CERTs “Secure Coding” item MSC09-C, CWE 78, CWE 73, CWE 116, and the CWE/SANS Top 25 Most Dangerous Programming Errors). This is a real problem, because on Unix-like systems (e.g., Unix, Linux, or POSIX) shells are universally available and widely used for lots of basic tasks.

This essay shows common wrong ways to handle filenames and pathnames in Bourne shells, and gives a summary of how to do it correctly for the impatient. It then walks through rationale so you can understand why common techniques do not work... and why the alternatives do. I presume that you already know how to write Bourne shell scripts.

The basic problem is that today most Unix-likes allow filenames to include almost any bytes. That includes newlines, tabs, the escape character (including escape sequences that can execute commands when displayed), other control characters, spaces (anywhere!), leading dashes (-), shell metacharacters, and byte sequences that arent legal UTF-8 strings. So your scripts could be fail or even be subverted if you ever unarchive “tar” or “zip” files from someone else, examine directories with files created by someone else, or simply create files yourself that contain shell metacharacters (like space or question mark).

This is not a just a shell problem. Lots of code in all languages (not just shell), and at least some GUI toolkits, do not handle all permitted filenames and pathnames correctly. Some GUI toolkits (e.g., file-pickers) presume that filenames are always in UTF-8 and never contain control characters, even though neither are necessarily true.

However, this flaw in Unix-like kernels (allowing dangerous filenames) combines with additional weaknesses in the Bourne shell language, making it even more difficult in shell to correctly handle filenames and pathnames. I think shell is a reasonable language for short scripts, when properly used, but the excessive permissiveness of filenames turns easy tasks into easily-done-wrong tasks. A few small changes would make it much easier to write secure code for handling filenames for all languages including shell. So if your script may handle unarchived files, or files created by different user or mobile app, then your script needs to handle this botched situation. Tools like shellcheck can help you find some of these problems, but not all of them, and you can use such tools more effectively if you understand the problem.

First, though, some key terminology. A pathname is used to identify a particular file, and may include zero or more “/” characters. Each pathname component (separated by “/”) is a officially called a filename; pathname components (aka filenames) cannot contain “/”. So officially “/usr/bin/sh” is a pathname, with pathname components (filenames) inside it, that refers to a particular file. (Note: on Cygwin, “” is a synonym for “/”, so it also separates pathname components.) In practice, many people use the term “filename” to mean both pathname components (which are officially filenames) and entire pathnames. Neither pathname components nor full pathnames can contain the NUL character (