hn-classics/_stories/1988/4360306.md

128 lines
7.4 KiB
Markdown
Raw Permalink Normal View History

---
created_at: '2012-08-09T09:38:52.000Z'
title: Tom Duff on Duff's Device (1988)
url: http://www.lysator.liu.se/c/duffs-device.html
author: mmphosis
points: 48
story_text: ''
comment_text:
num_comments: 20
story_id:
story_title:
story_url:
parent_id:
created_at_i: 1344505132
_tags:
- story
- author_mmphosis
- story_4360306
objectID: '4360306'
2018-06-08 12:05:27 +00:00
year: 1988
---
2018-02-23 18:19:40 +00:00
[Source](http://www.lysator.liu.se/c/duffs-device.html "Permalink to Tom Duff on Duff's Device")
# Tom Duff on Duff's Device
Subject: Re: Explanation, please!
Summary: Original citation
From: [td@alice.UUCP (Tom Duff)][1]
Organization: AT&T Bell Laboratories, Murray Hill NJ
Date: 29 Aug 88 20:33:51 GMT
Message-ID: <8144@alice.UUCP>
I normally do not read comp.lang.c, but Jim McKie told me that `[`Duff's device][2]'' had come up in comp.lang.c again. ` `I have lost the version that was sent to netnews in May 1984, but I have reproduced below the note in which I originally proposed the device. ` `(If anybody has a copy of the netnews version, I would gratefully receive a copy at research!td or td@research.att.com.)
To clear up a few points:
1. The point of the device is to express general loop unrolling directly in C. ` `People who have posted saying `just use memcpy' have missed the point, as have those who have criticized it using various machine-dependent memcpy implementations as support. ` `In fact, the example in the message is not implementable as memcpy, nor is any computer likely to have an memcpy-like idiom that implements it.
2. Somebody claimed that while the device was named for me, I probably didn't invent it. ` `I almost certainly did invent it. ` `I had definitely not seen or heard of it when I came upon it, and nobody has ever even claimed prior knowledge, let alone provided dates and times. ` `Note the headers on the message below: ` `apparently I invented the device on November 9, 1983, and was proud (or disgusted) enough to send mail to [dmr][3]. ` `Please note that I do not claim to have invented loop unrolling, merely this particular expression of it in C.
3. The device is legal dpANS C. ` `I cannot quote chapter and verse, but [Larry Rosler][4], who was chairman of the language subcommittee (I think), has assured me that X3J11 considered it carefully and decided that it was legal. Somewhere I have a note from [dmr][3] certifying that all the compilers that he believes in accept it. ` `Of course, the device is also legal C++, since Bjarne uses it in his book.
4. Somebody invoked (or more properly, banished) the `false god of efficiency.' ` `Careful reading of my original note will put this slur to rest. ` `The alternative to genuflecting before the god of code-bumming is finding a better algorithm. ` `It should be clear that none such was available. ` `If your code is too slow, you must make it faster. ` `If no better algorithm is available, you must trim cycles.
5. The same person claimed that the device wouldn't exhibit the desired speed-up. ` `The argument was flawed in two regards: ` `first, it didn't address the performance of the device, but rather the performance of one of its few uses (implementing memcpy) for which many machines have a high-performance idiom. ` `Second, the poster made his claims in the absence of timing data, which renders his assertion suspect. ` `A second poster tried the test, but botched the implementation, proving only that with diligence it is possible to make anything run slowly.
6. Even [Henry Spencer][5], who hit every other nail square on the end with the flat round thing stuck to it, made a mistake (albeit a trivial one). ` `Here is Henry replying to bill@proxftl.UUCP (T. William Wells):
>>... Dollars to doughnuts this
>>was written on a RISC machine.
>Nope.  Bell Labs Research uses VAXen and 68Ks, mostly.
I was at Lucasfilm when I invented the device.
7. Transformations like this can only be justified by measuring the resulting code. ` `Be careful when you use this thing that you don't unwind the loop so much that you overflow your machine's instruction cache. ` `Don't try to be smarter than an over-clever C compiler that recognizes loops that implement block move or block clear and compiles them into machine idioms.
Here then, is the original document describing Duff's device:
From research!ucbvax!dagobah!td ` `Sun Nov 13 07:35:46 1983
Received: by ucbvax.ARPA (4.16/4.13) ` `id AA18997; Sun, 13 Nov 83 07:35:46 pst
Received: by dagobah.LFL (4.6/4.6b) ` `id AA01034; Thu, 10 Nov 83 17:57:56 PST
Date: Thu, 10 Nov 83 17:57:56 PST
From: [ucbvax!dagobah!td (Tom Duff)][6]
Message-Id: <8311110157.AA01034@dagobah.LFL>
To: [ucbvax!decvax!hcr!rrg][7], [ucbvax!ihnp4!hcr!rrg][8], [ucbvax!research!dmr][3], [ucbvax!research!rob][9]
Consider the following routine, abstracted from code which copies an array of shorts into the Programmed IO data register of an Evans & Sutherland Picture System II:
send(to, from, count)
register short *to, *from;
register count;
{
do
*to = *from++;
while(--count>0);
}
(Obviously, this fails if the count is zero.)
The VAX C compiler compiles the loop into 2 instructions (a movw and a sobleq,
I think.) ` `As it turns out, this loop was the bottleneck in a real-time animation playback program which ran too slowly by about 50%. ` ` The standard way to get more speed out of something like this is to unwind the loop a few times, decreasing the number of sobleqs. ` `When you do that, you wind up with a leftover partial loop. ` `I usually handle this in C with a switch that indexes a list of copies of the original loop body. ` `Of course, if I were writing assembly language code, I'd just jump into the middle of the unwound loop to deal with the leftovers. ` `Thinking about this yesterday, the following implementation occurred to me:
send(to, from, count)
register short *to, *from;
register count;
{
register n=(count+7)/8;
switch(count%8){
case 0: do{ *to = *from++;
case 7: *to = *from++;
case 6: *to = *from++;
case 5: *to = *from++;
case 4: *to = *from++;
case 3: *to = *from++;
case 2: *to = *from++;
case 1: *to = *from++;
}while(--n>0);
}
}
Disgusting, no? ` `But it compiles and runs just fine. ` `I feel a combination of pride and revulsion at this discovery. ` `If no one's thought of it before, I think I'll name it after myself.
It amazes me that after 10 years of writing C there are still little corners that I haven't explored fully. ` `(Actually, I have another revolting way to use switches to implement interrupt driven state machines but it's too horrid to go into.)
Many people (even [bwk][10]?) have said that the worst feature of C is that switches don't break automatically before each case label. ` `This code forms some sort of argument in that debate, but I'm not sure whether it's for or against.
yrs trly
[Tom][1]
[1]: http://www.lysator.liu.se/td/index.html
[2]: http://www.lysator.liu.se#duffs-device
[3]: http://www.cs.bell-labs.com/who/dmr/index.html
[4]: http://www.lysator.liu.se/lr/index.html
[5]: http://www.lysator.liu.se/henry/
[6]: http://www.lysator.liu.se/td/
[7]: http://www.lysator.liu.se/rrg/index.html
[8]: http://www.lysator.liu.se/rrg/
[9]: http://www.lysator.liu.se/rob/index.html
[10]: http://www.lysator.liu.se/bwk/index.html