2018-03-03 09:35:28 +00:00
|
|
|
---
|
|
|
|
created_at: '2011-06-05T00:32:25.000Z'
|
|
|
|
title: 'Tom Duff: Reading Code From Top to Bottom (1999)'
|
|
|
|
url: http://iq0.com/notes/deep.nesting.html
|
|
|
|
author: gnosis
|
|
|
|
points: 56
|
|
|
|
story_text: ''
|
|
|
|
comment_text:
|
|
|
|
num_comments: 22
|
|
|
|
story_id:
|
|
|
|
story_title:
|
|
|
|
story_url:
|
|
|
|
parent_id:
|
|
|
|
created_at_i: 1307233945
|
|
|
|
_tags:
|
|
|
|
- story
|
|
|
|
- author_gnosis
|
|
|
|
- story_2620872
|
|
|
|
objectID: '2620872'
|
2018-06-08 12:05:27 +00:00
|
|
|
year: 1999
|
2018-03-03 09:35:28 +00:00
|
|
|
|
|
|
|
---
|
|
|
|
### Reading Code From Top to Bottom
|
|
|
|
|
|
|
|
This is a particularly simple example of a style of code that I often
|
|
|
|
see (this example is a piece of *prman*, sanitized to hide the function
|
|
|
|
of the code it comes from):
|
|
|
|
|
|
|
|
int
|
|
|
|
match(struct table *tab, char *arg)
|
|
|
|
{
|
|
|
|
int i, found = 0;
|
|
|
|
|
|
|
|
if (tab != NULL) {
|
|
|
|
for(i=0; i<tab->nitems; ++i)
|
|
|
|
if (tab->item[i].name == arg)
|
|
|
|
break;
|
|
|
|
found = (i < tab->nitems);
|
|
|
|
}
|
|
|
|
return found;
|
|
|
|
}
|
|
|
|
|
|
|
|
This function determines whether `tab` points to a table containing an
|
|
|
|
item whose name is `arg`.
|
|
|
|
|
|
|
|
I would write the above like this:
|
|
|
|
|
|
|
|
int match(struct table *tab, char *arg){
|
|
|
|
int i;
|
|
|
|
|
|
|
|
if(tab==NULL) return 0;
|
|
|
|
for(i=0;i!=tab->nitems;i++)
|
|
|
|
if(tab->item[i].name==arg)
|
|
|
|
return 1;
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
This avoids deep nesting. (The original wasn't *very* deep, but it's
|
|
|
|
only a simple example.) It avoids re-testing the loop condition after
|
|
|
|
the loop finishes. It avoids scattering multiple assignments to `found`
|
|
|
|
throughout the code. It makes the lifetime of the loop index be just the
|
|
|
|
loop. (Maybe I'm an old fart for this habit, which I cling to from my
|
|
|
|
Fortran days \[early 1970's\], but limiting the lifetimes of variables
|
|
|
|
makes things easier for register allocators and for people reading your
|
|
|
|
code.)
|
|
|
|
|
|
|
|
Most important, it reads from top to bottom, by which I mean that each
|
|
|
|
step follows from the preceding step, and the code stops (that is,
|
|
|
|
returns) as soon as it knows the answer. It tells a simple story:
|
|
|
|
|
|
|
|
> If you want to find an element of `tab->item` with `name==arg`,
|
|
|
|
>
|
|
|
|
> - First check that `tab` points to something.
|
|
|
|
> - Then check each element of the pointed-to array, returning if you
|
|
|
|
> find a match.
|
|
|
|
> - If you didn't find it, it's not there.
|
|
|
|
|
|
|
|
The old version of the story is extremely roundabout:
|
|
|
|
|
|
|
|
> If you want to find an element of `tab->item` with `name==arg`,
|
|
|
|
>
|
|
|
|
> - First, assume its not there.
|
|
|
|
> - Then check that `tab` points somewhere. If so
|
|
|
|
> - go through the array until you see what you're after.
|
|
|
|
> - When you're done looking, you found a match if and only if you
|
|
|
|
> didn't go through the whole array.
|
|
|
|
> - Having done all that, inform the caller about what you determined.
|
|
|
|
|
|
|
|
During the Structured Programming Era, programmers were often taught a
|
|
|
|
style very much like the old version. The language they were trained in
|
|
|
|
was normally Pascal, which only allowed a single return from a
|
|
|
|
procedure, more or less mandating that the return value appear in a
|
|
|
|
variable. Furthermore, teachers, influenced by Bohm and Jacopini (*Flow
|
|
|
|
Diagrams, Turing Machines and Languages with only two formation rules*,
|
|
|
|
Comm. ACM 9\#5, 1966, pp 366-371), often advocated reifying control
|
|
|
|
structure into Boolean variables as a way of assuring that programs had
|
|
|
|
reducible flowgraphs.
|
|
|
|
|
|
|
|
But reducible flowgraphs can still be deeply nested and therefore hard
|
|
|
|
to read. Long chains of reasoning or long sequences of actions are not a
|
|
|
|
particular problem to understand, especially for people of a
|
|
|
|
mathematical or otherwise analytic bent. Sequences that branch, so that
|
|
|
|
a particular step may have multiple antecedents, all but one of which
|
|
|
|
will be far from it on the page are a different matter. Reading them
|
|
|
|
requires keeping in the mind a list of suspended options while paying
|
|
|
|
attention to something else, a combination of mental activities that
|
|
|
|
people are notoriously bad at. You should try as much as possible to
|
|
|
|
structure your programs so that they proceed step-by-step. Whenever
|
|
|
|
nesting gets deep, or code refers to results computed more than a few
|
|
|
|
steps previously, you're headed for trouble.
|
|
|
|
|
|
|
|
A common indicator of trouble with excessive nesting is narrow tab
|
|
|
|
stops. Deeply nested code, at eight spaces per tab (as is usual on UNIX)
|
|
|
|
quickly migrates over to the right margin. The easy way to fix this is
|
|
|
|
to indent four spaces at a time (or even two\!). The right way is to
|
|
|
|
rewrite it so that it proceeds from top to bottom, not from outside to
|
|
|
|
inside, left to right.
|
|
|
|
|
|
|
|
A similar phenomenon happens in mathematical proofs. Most proofs involve
|
|
|
|
a chain of implications, each usually following from its immediate
|
|
|
|
prececessor. When the proof requires a step that follows from some
|
|
|
|
non-immediate predecessor, we are brought up short -- violation of the
|
|
|
|
orderly step-by-step progress of the proof disturbs the narrative rhythm
|
|
|
|
of the proof, and is a likely signal of a difficulty, either in the
|
|
|
|
proof or in the reader's understanding. Mathematicians routinely hide
|
|
|
|
these narrative anomalies in lemmas -- proof subroutines.
|
|
|
|
|
|
|
|
In fact, there is evidence that mathematicians choose the theorems that
|
|
|
|
they work on to be amenable to proof using simple structures. For
|
|
|
|
example, when Doug McIlroy investigated the problem of automatically
|
|
|
|
proving theorems of Euclidean geometry (personal communication, 1984 --
|
|
|
|
I don't believe this work was ever published) he discovered that when
|
|
|
|
theorems are rewritten as systems of simultaneous equations (sometimes
|
|
|
|
inequalities are required as well) the systems are usually \`triangular'
|
|
|
|
(they're not generally linear, so they're not strictly triangular, but
|
|
|
|
you know what I mean) and the proofs mostly proceed immediately by
|
|
|
|
back-subsititution.
|
|
|
|
|
|
|
|
### *Else* Considered Harmful
|
|
|
|
|
|
|
|
It is often useful to eschew the *else* clause of the *if* statement.
|
|
|
|
Instead of writing
|
|
|
|
|
|
|
|
int f(int x){
|
|
|
|
int v;
|
|
|
|
|
|
|
|
if(b(x)) v=f1(x);
|
|
|
|
else v=f2(x);
|
|
|
|
return v;
|
|
|
|
}
|
|
|
|
|
|
|
|
it usually pays to write
|
|
|
|
|
|
|
|
int f(int x){
|
|
|
|
if(b(x)) return f1(x);
|
|
|
|
return f2(x);
|
|
|
|
}
|
|
|
|
|
|
|
|
Even in a simple function like this, the increase in clarity is obvious
|
|
|
|
to me. In a more complicated situation, where `v=f1(x)` is replaced by a
|
|
|
|
more complicated computation with more nested ifs, the reduction in
|
|
|
|
nesting can be dramatic. Using the single-tailed if gives you some extra
|
|
|
|
freedom in wording your code to read top-to-bottom. As an extreme
|
|
|
|
example, the following
|
|
|
|
|
|
|
|
int g(int x){
|
|
|
|
if(b1(x)){
|
|
|
|
if(b2(x)){
|
|
|
|
if(b3(x)){
|
|
|
|
if(b4(x)){
|
|
|
|
if(b5(x))return g1(x);
|
|
|
|
return g2(x);
|
|
|
|
}
|
|
|
|
return g3(x);
|
|
|
|
}
|
|
|
|
return g4(x);
|
|
|
|
}
|
|
|
|
return g5(x);
|
|
|
|
}
|
|
|
|
return g6(x);
|
|
|
|
}
|
|
|
|
|
|
|
|
should be rewritten as
|
|
|
|
|
|
|
|
int g(int x){
|
|
|
|
if(!b1(x)) return g6(x);
|
|
|
|
if(!b2(x)) return g5(x);
|
|
|
|
if(!b3(x)) return g4(x);
|
|
|
|
if(!b4(x)) return g3(x);
|
|
|
|
if(!b5(x)) return g2(x);
|
|
|
|
return g1(x);
|
|
|
|
}
|
|
|
|
|
|
|
|
If this function had involved two-tailed rather than one-tailed ifs,
|
|
|
|
reversing the sense of the tests would not change the depth to which the
|
|
|
|
code was nested, and there would likely be no net change in readability.
|
|
|
|
|
|
|
|
### Another Example
|
|
|
|
|
|
|
|
Right now (Oct 7, 1998) I'm working on an interpreter for a programming
|
|
|
|
language that looks a lot like C, but lacks a few features. (Most of the
|
|
|
|
restrictions are inspired by concerns discussed in this note.) To test
|
|
|
|
its call and return mechanisms I thought I'd try running Ackerman's
|
|
|
|
function. I have a C version sitting around that just parrots the usual
|
|
|
|
lambda-calculus definition:
|
|
|
|
|
|
|
|
int a(int i, int j){
|
|
|
|
return i?a(i-1, j?a(i, j-1):1):j+1;
|
|
|
|
}
|
|
|
|
int ackerman(int i){
|
|
|
|
return a(i, i);
|
|
|
|
}
|
|
|
|
|
|
|
|
The language I'm testing is missing the `?:` conditional expression, so
|
|
|
|
I had to reword function `a` using ordinary if statements. Doing this
|
|
|
|
mechanically, you get
|
|
|
|
|
|
|
|
int a(int i, int j){
|
|
|
|
if(i){
|
|
|
|
if(j) return a(i-1, a(i, j-1));
|
|
|
|
return a(i-1, 1);
|
|
|
|
}
|
|
|
|
return j+1;
|
|
|
|
}
|
|
|
|
|
|
|
|
Of course I thought this read awkwardly and reversed the tests, yielding
|
|
|
|
|
|
|
|
int a(int i, int j){
|
|
|
|
if(i==0) return j+1;
|
|
|
|
if(j==0) return a(i-1, 1);
|
|
|
|
return a(i-1, a(i, j-1));
|
|
|
|
}
|
|
|
|
|
|
|
|
*Much* better.
|
|
|
|
|
|
|
|
### An Aesthetic Conundrum
|
|
|
|
|
|
|
|
I don't like side effects in conditionals (or embedded in most
|
|
|
|
expressions, but let's just think about conditionals for now.) Code
|
|
|
|
almost always reads better if you separate the side effects from the
|
|
|
|
tests. For example, I write
|
|
|
|
|
|
|
|
```
|
|
|
|
do
|
|
|
|
c=getchar();
|
|
|
|
while(strchr(" \t\n", c));
|
|
|
|
```
|
|
|
|
|
|
|
|
in preference to
|
|
|
|
|
|
|
|
```
|
|
|
|
while(strchr(" \t\n", c=getchar()));
|
|
|
|
```
|
|
|
|
|
|
|
|
although many people prefer the latter. This is justifiable as reading
|
|
|
|
top-to-bottom instead of inside-out.
|
|
|
|
|
|
|
|
Nevertheless, today (Sep 4, 1998) I found myself writing a function that
|
|
|
|
read like this:
|
|
|
|
|
|
|
|
char *skip(char *s, int c){
|
|
|
|
while(*s!='\0' && *s++!=c);
|
|
|
|
return s;
|
|
|
|
}
|
|
|
|
|
|
|
|
This function just skips past the first occurrence of `c` in `s`,
|
|
|
|
stopping also if it finds a *nul*.
|
|
|
|
|
|
|
|
I did this after examining several alternatives, like
|
|
|
|
|
|
|
|
char *skip(char *s, int c){
|
|
|
|
while(*s!='\0' && *s!=c) s++;
|
|
|
|
if(*s==c) s++;
|
|
|
|
return s;
|
|
|
|
}
|
|
|
|
|
|
|
|
and
|
|
|
|
|
|
|
|
char *skip(char *s, int c){
|
|
|
|
for(;*s!='\0';s++) if(*s==c) return s+1;
|
|
|
|
return s;
|
|
|
|
}
|
|
|
|
|
|
|
|
Neither of the last two satisfied me, the first because it retests
|
|
|
|
`*s==c` after the loop and the second because the two halves of the test
|
|
|
|
are unnaturally separated.
|
|
|
|
|
|
|
|
I wound up using the first version in spite of the side effect in the
|
|
|
|
test because it struck me as less infelicitous than the others -- that
|
|
|
|
doesn't mean I'm happy with it.
|
|
|
|
|
|
|
|
### Another Problematic Fragment
|
|
|
|
|
|
|
|
Yesterday (Feb 21, 1999) I rewrote my *com* command (because I was
|
|
|
|
working on a un-networked machine at home, and I was too lazy to get up
|
|
|
|
and transfer it on a floppy\!) The [original](../duffgram/com.lic)
|
|
|
|
contains this code (omitting some error handling), which looks for the
|
|
|
|
sequence `/*%` in a file:
|
|
|
|
|
|
|
|
```
|
|
|
|
for(;;){
|
|
|
|
c=getc(f);
|
|
|
|
if(c==EOF){ /* error code goes here */ }
|
|
|
|
if(c=='/'){
|
|
|
|
if((c=getc(f))=='*' && (c=getc(f))=='%') break;
|
|
|
|
ungetc(c, f); /* might be another slash! */
|
|
|
|
}
|
|
|
|
}
|
|
|
|
```
|
|
|
|
|
|
|
|
Today, I wouldn't write it that way -- I'd avoid the `if` with
|
|
|
|
side-effects. The first version I wrote yesterday looked something like
|
|
|
|
this:
|
|
|
|
|
|
|
|
```
|
|
|
|
while((c=getc(f))!=EOF){
|
|
|
|
if(c=='/'){
|
|
|
|
c=getc(f);
|
|
|
|
if(c=='*'){
|
|
|
|
c=getc(f);
|
|
|
|
if(c=='%'){
|
|
|
|
/* code that reacts to finding /*% */
|
|
|
|
}
|
|
|
|
else ungetc(c, f);
|
|
|
|
}
|
|
|
|
else ungetc(c, f);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
/* error code goes here */
|
|
|
|
```
|
|
|
|
|
|
|
|
Several things struck me about this:
|
|
|
|
|
|
|
|
- It's obviously written inside out -- the code reads horizontally,
|
|
|
|
not vertically.
|
|
|
|
- I deliberately avoided side effects in the `if` tests, but in a
|
|
|
|
moment of weakness retained `while((c=getc(f))!=EOF)`.
|
|
|
|
- I don't like that it behaves oddly if `ungetc(EOF, f)` does the
|
|
|
|
wrong thing. The standard says that it has no effect -- a buggy
|
|
|
|
`stdio` might push `EOF&0xff` back.
|
|
|
|
- Besides, `ungetc` is a silly thing to do with a character that's
|
|
|
|
just going to reappear in `c` immediately.
|
|
|
|
|
|
|
|
Keeping in mind the shallow-nesting and top-to-bottom-reading ideas that
|
|
|
|
I've been writing about, I redid it like this:
|
|
|
|
|
|
|
|
```
|
|
|
|
for(;;){
|
|
|
|
c=getc(f);
|
|
|
|
MightBeSlash:
|
|
|
|
if(c==EOF){ /* error code goes here */ }
|
|
|
|
if(c!='/') continue;
|
|
|
|
c=getc(f);
|
|
|
|
if(c!='*') goto MightBeSlash;
|
|
|
|
c=getc(f);
|
|
|
|
if(c!='%') goto MightBeSlash;
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
```
|
|
|
|
|
|
|
|
Now it reads top to bottom and, to me at least, says what it means. It's
|
|
|
|
a net improvement over the other version.
|
|
|
|
|
|
|
|
But surely this is no good. The `goto`s are fairly benign, but still
|
|
|
|
grate. Most egregious is the loop whose last statement is `break;`\!
|
|
|
|
|
|
|
|
I let this code sit for an hour (it was lunch time) and then rewrote it.
|
|
|
|
I can't remember what I eventually used, but it was not very satisfying.
|
|
|
|
Looking at the various versions today, the one that strikes me as best
|
|
|
|
is the original, with all of its calls to `getc` inside the tests. (Of
|
|
|
|
course, if I were writing this code professionally, I'd probably call
|
|
|
|
`fgets` and `strstr` rather than paw through the file byte by byte.)
|
|
|
|
|
|
|
|
What does that say about all my preaching? Only that truly general
|
|
|
|
principals are hard to come by, and ultimately every case is a different
|
|
|
|
case. Now that I have two examples of side effects inside conditionals
|
|
|
|
that I think read better than the alternatives, maybe I'd better think
|
|
|
|
harder about what principles are involved.
|
|
|
|
|
|
|
|
### What About `goto`?
|
|
|
|
|
|
|
|
*If you want to go somewhere,
|
|
|
|
goto is the best way to get there.*
|
|
|
|
\-Ken Thompson
|
|
|
|
|
|
|
|
How can I say that a program with two `goto`s is \`a net improvement'
|
|
|
|
over one with none? The problem with `goto` is that it promotes
|
|
|
|
spaghetti -- when you're trying to understand a program, every time you
|
|
|
|
see a label, you must look at the code at each `goto` that targets the
|
|
|
|
label to see what the sequence of operations might be. This is the same
|
|
|
|
problem that deep nesting causes -- code at any number of far away
|
|
|
|
places might immediately precede the code at a label, and it is often
|
|
|
|
difficult to keep in mind all the things that might have occurred when
|
|
|
|
you reach the label.
|
|
|
|
|
|
|
|
During the structured programming era, `goto` was a major bugaboo, so
|
|
|
|
much so that most programmers today, with little or no experience
|
|
|
|
thinking about code that uses `goto`, nevertheless recoil in revulsion
|
|
|
|
at its use.
|
|
|
|
|
|
|
|
The whole structured programming debate was started by [Edsger W.
|
|
|
|
Dijkstra's essay](http://www.acm.org/classics/oct95/) *Go To Statement
|
|
|
|
Considered Harmful*, Communications of the ACM, Vol. 11, No. 3, March
|
|
|
|
1968, pp. 147-148. One of the oft-cited bolsters of the argument to
|
|
|
|
eschew the `goto` was that it appeared to be very difficult to describe
|
|
|
|
`goto` in terms of C. A. R. Hoare's system of axiomatic semantics.
|
|
|
|
Hoare's system involved specifying a predicate that gives the initial
|
|
|
|
assumptions (pre-conditions) about a program. For each statement type
|
|
|
|
there is a rule of inference that describes how to derive a
|
|
|
|
post-condition from the statement's pre-condition. For example, let's
|
|
|
|
think about the statment `while(B)S`, where `B` is an arbitrary Boolean
|
|
|
|
(with no side effects\!) and `S` is any statement (that doesn't transfer
|
|
|
|
control out of the loop\!). Hoare writes `P{S}Q` to mean that if
|
|
|
|
proposition `P` is true before `S` is executed, then you can infer that
|
|
|
|
`Q` is true afterwards. So, Hoare's rule for the `while` statement is:
|
|
|
|
|
|
|
|
All this says is that If executing `S` doesn't falsify `P` whenever `B`
|
|
|
|
is also initially true, then executing the loop `while(B) S` preserves
|
|
|
|
`P` and guarantees that `B` becomes false. (`P` is usually called the
|
|
|
|
*loop invariant*.) As far as Hoare's semantics is concerned, this is the
|
|
|
|
meaning of a while loop.
|
|
|
|
|
|
|
|
As I said, people were fond of pointing out that `goto` was
|
|
|
|
\`\`obviously'' hard to formulate in terms of Hoare's semantics. So
|
|
|
|
imagine everyone's surprise when Hoare discovered the following simple
|
|
|
|
rule:
|
|
|
|
|
|
|
|
First, for each label `L` in the program, we attach a predicate `PL`
|
|
|
|
that is to be true whenever we reach `L`. Then, the `goto` rule is
|
|
|
|
|
|
|
|
All this says is that if some proposition is true before you execute a
|
|
|
|
`goto` then it's also true after you get there and furthermore, you
|
|
|
|
can't fall through the `goto`, because `false` is true if you do.
|
|
|
|
|
|
|
|
We also need a rule that says what happens at a label. This is just
|
|
|
|
|
|
|
|
That is, a statement still does the same thing if you put a label on it.
|
|
|
|
|
|
|
|
For me, the Hoare semantics is a good way of thinking about `goto`: a
|
|
|
|
`goto` is a way of getting to somewhere that you can do the same things
|
|
|
|
as where you are now. The main practical consequences of this view are
|
|
|
|
that I think very carefully about whether the points at the label and at
|
|
|
|
the `goto` are really semantically equivalent, and that I try to
|
|
|
|
carefully choose labels to say what the corresponding predicate is. For
|
|
|
|
example, in the loop in the previous section, the label `MightBeSlash`
|
|
|
|
suggests that that we've read a character into `c`, but have not yet
|
|
|
|
been checked to see whether or not it's a slash. (Of course, if you look
|
|
|
|
at the code, you see that another possibility is that it might be an
|
|
|
|
EOF, but I took the view that `MightBeSlashOrEOF` was an awfully long
|
|
|
|
name for a label that's never mentioned more than 6 lines away from its
|
|
|
|
definition. Maybe I should have called it `ExamineNewCharacter` or
|
|
|
|
something like that.)
|
|
|
|
|
|
|
|
The best reference on programming in a disciplined way using `goto` is
|
|
|
|
still D. E. Knuth's paper Structured Programming with go to Statements,
|
|
|
|
ACM Computing Surveys, Vol. 6, No. 4, December 1974, pp. 261-302. I
|
|
|
|
think that everything I have to say about `goto` is in his paper.
|