hn-classics/_stories/2001/16405136.md

163 lines
7.9 KiB
Markdown
Raw Permalink Blame History

This file contains invisible Unicode characters

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
created_at: '2018-02-18T09:09:35.000Z'
title: Monadic I/O and Unix shell programming (2001)
url: http://okmij.org/ftp/Computation/monadic-shell.html
author: DyslexicAtheist
points: 71
story_text:
comment_text:
num_comments: 10
story_id:
story_title:
story_url:
parent_id:
created_at_i: 1518944975
_tags:
- story
- author_DyslexicAtheist
- story_16405136
objectID: '16405136'
year: 2001
---
[Source](http://okmij.org/ftp/Computation/monadic-shell.html "Permalink to UNIX pipes as IO monads")
# UNIX pipes as IO monads
[previous][1]   [next][2]   [contents][1]   [top][3]
* * *
# Monadic i/o and UNIX shell programming
This is an essay inspired by Philip Wadler's paper "How to Declare an Imperative" [[Wadler97][4]]. We will show uncanny similarities between monadic i/o in Haskell, and UNIX filter compositions based on pipes and redirections. UNIX pipes (treated semantically as writing to temporary files) are quite similar to monads. Furthermore, at the level of UNIX programming, all i/o can be regarded monadic.
1. [IO monad in Unix shell][5]
1. [Parallel vs. sequential execution of monadic commands][5]
2. [All i/o is monadic][5]
3. [Acknowledgment][6]
4. [References][7]
  
## IO monad in Unix shell
Let's consider the following UNIX command line:
echo 'aaa bbb' | cat > /dev/tty
From shell's point of view, the filters -- `echo` and `cat` \-- are pure, referentially-transparent functions of their arguments: input files and command-line parameters. Operator `|` (a pipe), "forces" a command, which is a promise of output, on its left producing a stream, an _argument_ for a command on the right-hand side of the pipe. A shell pipe thus is an equivalent of a monadic `>>=` operator.
Furthermore, we can write the following correspondence between Haskell's monadic i/o forms and UNIX shell operations:
| ----- |
| Haskell |   | Shell |
| bound variable |   | the name of an existing file |
| command |   | sh-command, executable, filter |
| `a >> b` |   | `(a; b)` |
| `a >>= b` |   | `a | b` |
| `done` |   | `cat /dev/null` |
| `return x` |   | `cat x` |
| `puts` |   | `echo` |
As in Haskell, `cat /dev/null` and `(a; b)` form a monoid; and so do `return x` and `>>=` . Indeed, `cat /dev/null` is the left and the right unit for the "command concatenator" `(a; b)` , and the command concatenation is associative. By the same token,
| ----- |
| Haskell [[Wadler97][4]] |   | Shell |
| `return v >>= x -> m`   is  `m[x:=v]` |   | `cat v | cmd`   is  `cmd < v` |
| `m >>= x -> return x`   is  `m` |   | `cmd | cat`   is  `cmd` |
| `m >>= x -> (n >>= y -> o)`
is  `(m >>= x->n) >>= y->o` |   | `cmd1 | (cmd2 | cmd3)`
is  `( cmd1 | cmd2 ) | cmd3` |
We can have /bin/sh literally run both parts of the last expression: `cmd1 | (cmd2 | cmd3)` and `( cmd1 | cmd2 ) | cmd3` . The shell will interpret the parentheses properly. Both expressions will produce the same output!
Incidentally, a UNIX filter is expected to sequentially consume its input and produce a single output stream. Therefore, from the point of view of the shell and filters, the input and the output streams are 'linear', singularly referenced objects. No wonder i/o involving linear streams is monadic.
Doug McIlroy, the inventor of pipes, is said to point out that both pipes and lazy lists behave exactly like coroutines.
  
### Parallel vs. sequential execution of monadic commands
One may wonder if the analogy between pipes and monads is accurate, given that `m >>= x-> n` first executes `m` and then `n` , whereas `m | n` runs `m` and `n` in parallel.
Let us consider a situation where command `m` in a pipe ` m | n ` produces only 100 bytes of output and finishes. We also assume that `m` is scheduled to run first. As `m` 's output fits entirely into the pipe buffer (4k in most OSes) `m` can finish before `n` starts to run. The whole pipe then becomes equivalent to
m > temp_file; n < temp_file
Suppose that that both `m` and `n` are "proper" filters. That is, `m` produces its output strictly sequentially and never reads or scans back what it already wrote; `n` reads its input strictly sequentially and only once. For `m` and `n` , their input and output are _linear_ datatypes -- the assumption that also holds for Haskell IO commands. We can claim therefore that if the output from `m` is bounded, the pipe `m | n` is semantically identical to
m > temp_file; n < temp_file
Incidentally, this is how pipes are implemented under MS-DOS, which is a single-task operating system. In any case, the operating system shields consumers and producers of sequential streams from details of the i/o. The processes will use the same OS primitives regardless of physical data exchange channels -- a local file, a remote file, a pipe, a TCP pipe, etc.
A UNIX pipe may therefore be considered an optimization. A pipe has the same sequential semantics as the data exchange via a temporary file. A pipe however saves the OS trouble creating a file and doing any physical i/o. In addition, like any lazy contraption, pipes can handle an unbounded communication. That is the only thing that sets them apart from Haskell's `m >>= x-> n` . Still, if the result produced by `m` is a lazy list or a stream and `n` is not eager to evaluate its argument entirely, even that distinction between `|` and `>>=` becomes blurred.
Thus with pipes treated as writing to temporary files, they are quite similar to monads.
  
## All i/o is monadic
A programmer writing an i/o processing code does not actually expect the code to run as he types it. The processing _will_ occur when the code is compiled and submitted to an OS for execution. Or when the last parenthesis is closed and the code is (re) submitted to an evaluator/interpreter. It is highly unlikely that a piece of code than handles pushing of a GUI button is written on the fly, at the moment the button is pressed. Rather, a programmer wrote the handler well in advance, _anticipating_ what the program will do when the button actually gets pressed. Note the future tense in the above phrases. All data processing is programmed in advance and performed later, when submitted to an actor -- be it a CPU, an OS, an evaluator or a `do` keyword.
In classical PASCAL, the main program's code starts with
PROGRAM foo(input,output);
That is, the "world", the input and output files, are given explicitly to a program as arguments. As long as data processing done by this code is deterministic and the program refrains from opening other files (which the classical PASCAL disallowed), a PASCAL program, as a _whole_ , is truly a _pure_ function of its argument, an input file. For example, if a program copies input into output, it would always produce the same output given the same input. Thus a PASCAL program - again, as a whole - is referentially transparent. Although it may use non-pure read/write operations to accomplish its job...
  
## Acknowledgment
Philip Wadler's papers on monads, in particular, [[Wadler97][4]], as well as his additional comments have been the source of great inspiration, insight, and enjoyment.
  
## References
[Wadler97] How to Declare an Imperative. ACM Comp. Surveys, Vol. 29, No. 3, September 1997
[Monadic-io-Scheme] Monadic scheme of i/o
<<http://okmij.org/ftp/Scheme/misc.html#monadic-io>>
[exec-with-piped] Writing agents in sh: conversing through a pipe
<<http://okmij.org/ftp/Communications.html#sh-agents>>
* * *
### Last updated July 1, 2001
This site's top page is **<http://okmij.org/ftp/>[** ][8]
oleg-at-okmij.org
Your comments, problem reports, questions are very welcome!
Converted from SXML by SXML->HTML
[1]: http://okmij.org/index.html
[2]: http://okmij.org/type-arithmetics.html
[3]: http://okmij.org/README.html
[4]: http://okmij.org#Wadler97
[5]:
[6]: http://okmij.org#Acknowledgment
[7]: http://okmij.org#References
[8]: http://okmij.org/ftp/