hn-classics/_stories/2010/9021786.md

58 lines
5.1 KiB
Markdown
Raw Normal View History

---
created_at: '2015-02-09T15:57:30.000Z'
title: 'Filenames and Pathnames in Shell: How to Do It Correctly (2010)'
url: http://www.dwheeler.com/essays/filenames-in-shell.html
author: thefox
points: 85
story_text: ''
comment_text:
num_comments: 20
story_id:
story_title:
story_url:
parent_id:
created_at_i: 1423497450
_tags:
- story
- author_thefox
- story_9021786
objectID: '9021786'
---
2018-02-23 18:19:40 +00:00
[Source](https://www.dwheeler.com/essays/filenames-in-shell.html "Permalink to Filenames and Pathnames in Shell (bash, dash, ash, ksh, and so on): How to do it Correctly")
# Filenames and Pathnames in Shell (bash, dash, ash, ksh, and so on): How to do it Correctly
# Filenames and Pathnames in Shell: How to do it Correctly
## David A. Wheeler
## 2016-05-04 (original version 2010-05-19)
Many Bourne shell scripts (as run by bash, dash, ash, ksh, and so on) do _**not**_ handle filenames and pathnames correctly on Unix-like/POSIX systems. Some shell programming books teach it wrongly, and even the [POSIX standard sometimes gets it wrong][1]. Thus, many shell scripts are buggy, leading to surprising failures and in some cases security vulnerabilities (see the “[Secure Programming for Linux and Unix HOWTO” section on filenames][2], [ CERTs “Secure Coding” item MSC09-C][3], [CWE 78][4], [CWE 73][5], [CWE 116][6], and the [CWE/SANS Top 25 Most Dangerous Programming Errors][7]). This is a real problem, because on Unix-like systems (e.g., Unix, Linux, or POSIX) shells are universally available and widely used for lots of basic tasks.
This essay shows [common _wrong_ ways][8] to handle filenames and pathnames in Bourne shells, and gives a [summary of how to do it correctly for the impatient][9]. It then walks through [rationale][10] so you can _understand_ why common techniques do not work... and why the alternatives do. I presume that you already know how to write Bourne shell scripts.
The basic problem is that today [most Unix-likes allow filenames to include almost _any_ bytes][11]. That includes newlines, tabs, the escape character (including escape sequences that can execute commands when displayed), other control characters, spaces (anywhere!), leading dashes (-), shell metacharacters, and byte sequences that arent legal UTF-8 strings. So your scripts could be fail or even be subverted if you ever unarchive “tar” or “zip” files from someone else, examine directories with files created by someone else, or simply create files yourself that contain shell metacharacters (like space or question mark).
This is not a _just_ a shell problem. Lots of code in _all_ languages (not just shell), and at least some GUI toolkits, do not handle all permitted filenames and pathnames correctly. Some GUI toolkits (e.g., file-pickers) presume that filenames are always in UTF-8 and never contain control characters, even though neither are necessarily true.
However, this [flaw in Unix-like kernels (allowing dangerous filenames)][11] combines with additional weaknesses in the Bourne shell language, making it even _more_ difficult in shell to correctly handle filenames and pathnames. I think shell is a reasonable language for short scripts, when properly used, but the excessive permissiveness of filenames turns easy tasks into easily-done-wrong tasks. A [few small changes would make it much easier to write secure code for handling filenames][11] for all languages including shell. So if your script may handle unarchived files, or files created by different user or mobile app, then your script needs to handle this botched situation. Tools like [shellcheck][12] can help you find some of these problems, but not all of them, and you can use such tools more effectively if you understand the problem.
First, though, some key terminology. A [_pathname_ is used to identify a particular file][13], and may include zero or more “/” characters. Each pathname component (separated by “/”) is a officially called a [filename][13]; pathname components (aka filenames) cannot contain “/”. So officially “/usr/bin/sh” is a pathname, with pathname components (filenames) inside it, that refers to a particular file. (Note: on Cygwin, “” is a synonym for “/”, so it also separates pathname components.) In practice, many people use the term “filename” to mean both pathname components (which are officially filenames) and entire pathnames. Neither pathname components nor full pathnames can contain the NUL character (
[1]: http://austingroupbugs.net/view.php?id=248
[2]: https://www.dwheeler.com/secure-programs/Secure-Programs-HOWTO/file-names.html
[3]: https://www.securecoding.cert.org/confluence/display/seccode/MSC09-C.+Character+Encoding+-+Use+Subset+of+ASCII+for+Safety
[4]: http://cwe.mitre.org/data/definitions/78.html
[5]: http://cwe.mitre.org/data/definitions/73.html
[6]: http://cwe.mitre.org/data/definitions/116.html
[7]: http://cwe.mitre.org/top25/index.html
[8]: https://www.dwheeler.com#wrong
[9]: https://www.dwheeler.com#summary
[10]: https://www.dwheeler.com#basic-rationale
[11]: https://www.dwheeler.com/fixing-unix-linux-filenames.html
[12]: http://www.shellcheck.net
[13]: http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap03.html#tag_03_267