170 lines
7.4 KiB
Markdown
170 lines
7.4 KiB
Markdown
[Source](http://gcc.gnu.org/ml/gcc/2002-08/msg00552.html "Permalink to Linus Torvalds - Re: Faster compilation speed")
|
||
|
||
# Linus Torvalds - Re: Faster compilation speed
|
||
|
||
This is the mail archive of the `gcc@gcc.gnu.org` mailing list for the [GCC project][1].
|
||
|
||
* * *
|
||
|
||
| ----- |
|
||
| Index Nav: | [[Date Index][2]] [[Subject Index][3]] [[Author Index][4]] [[Thread Index][5]] |
|
||
| Message Nav: | [[Date Prev][6]] [[Date Next][7]] | [[Thread Prev][6]] [[Thread Next][7]] |
|
||
| Other format: | [[Raw text][8]] | |
|
||
|
||
# Re: Faster compilation speed
|
||
|
||
* _From_: Linus Torvalds <torvalds at transmeta dot com>
|
||
* _To_: kevin at atkinson dot dhs dot org, gcc at gcc dot gnu dot org
|
||
* _Cc_:
|
||
* _Date_: Fri, 9 Aug 2002 20:28:16 -0700
|
||
* _Subject_: Re: Faster compilation speed
|
||
* _Newsgroups_: linux.egcs
|
||
* _Organization_:
|
||
* _References_: <[200208100156.g7A1uwn01415@penguin.transmeta.com][9]>
|
||
* * *
|
||
|
||
|
||
In article <[Pine.LNX.4.44.0208092227500.2273-100000@kevin-pc.atkinson.dhs.org][10]> you write:
|
||
>On Fri, 9 Aug 2002, Linus Torvalds wrote:
|
||
>
|
||
>> And that, in turn, is probably impossible to fix as long as gcc uses
|
||
>> garbage collection for most of its internal memory management. There
|
||
>> just aren't all that many worse ways to f*ck up your cache behaviour
|
||
>> than by using lots of allocations and lazy GC to manage your memory.
|
||
>
|
||
>Excuse the interruption, but from what I read a good generational garbage
|
||
>collector can be just as fast as manually managing memory?
|
||
|
||
All the papers I've seen on it are total jokes. But maybe I've looked
|
||
at the wrong ones.
|
||
|
||
One fundamental fact on modern hardware is that data cache locality is
|
||
good, and not being in the cache sucks. This is not likely to change.
|
||
In particular, this means that if you allocate stuff, you want to re-use
|
||
the stuff you just freed _as_soon_as_possible_ - preferably before the
|
||
previously dirty data has ever even been evicted from the cache, so that
|
||
you can re-use the thing to avoid reading it in, but also to avoid
|
||
writing out stale data.
|
||
|
||
This implies that any lazy de-allocation is bad. When a piece of memory
|
||
is free, you want to de-allocate it _immediately_, so that the next
|
||
allocation gets to re-use it and gets the cache footprint "for free".
|
||
|
||
Generational garabage collectors tend to never re-use hot objects, and
|
||
often do the copying between generations making things even worse on the
|
||
cache. Compaction helps subsequent use somewhat, but is in itself
|
||
inherently costly, and the indirection (or fixup) implied by it can
|
||
limit other optimization.
|
||
|
||
Sure, by being lazy you can sometimes win in icache footprint (and in
|
||
instruction count - a lot of the "GC is fast" papers seem to rely on the
|
||
fact that you can do other optimizations if you're lazy), but you lose
|
||
big in dirty dcache footprint. And since dcache is much more expensive
|
||
than instructions, you're better off doing explicit memory management
|
||
with refcounting (optionally helped by the programming language, of
|
||
course. You can make exact refcounting be your "GC" with some language
|
||
support).
|
||
|
||
However, there's another, more fundamental issue. It's the _mindset_.
|
||
The GC mindset tends to go hand-in-hand with pointer chasing, while
|
||
people who use explicit allocators tend to be happier with doing things
|
||
like "realloc()" and trying to use arrays and indexes instead of linked
|
||
lists and just generally trying to avoid allocating lots of small
|
||
things. Which tends to be better on the cache.
|
||
|
||
Yes, I generalize. Don't we all?
|
||
|
||
For example, if you have an _explicit_ refcounting system, then it is
|
||
quite natural to have operations like "copy-on-write", where if you
|
||
decide to change a tree node you do something like
|
||
|
||
copy_on_write(node_t **np)
|
||
{
|
||
|
||
note_t *node = *np;
|
||
if (node->count > 1)
|
||
newnode = copy_alloc(node);
|
||
*np = newnode;
|
||
node->count--;
|
||
node = newnode;
|
||
}
|
||
return node;
|
||
}
|
||
|
||
and then before you change a tree node you do
|
||
|
||
node = copy_on_write(&tree->node);
|
||
.. we now know we are the exclusive owners of "node" ..
|
||
|
||
which tends to be very efficient - it allows sharing, even if sharing is
|
||
often not the common case (and doesn't do any extra allocations for the
|
||
common case of an access that was already exclusively owned).
|
||
|
||
(If you want to be thread-safe you need to be more careful yet, and have
|
||
thread-safe "get_node()/put_node()" actions etc. Most applications
|
||
don't need to be that careful, but you'll see a _lot_ of this inside an
|
||
operating system).
|
||
|
||
In contrast, in a GC system where you do _not_ have access to the
|
||
explicit refcounting, you tend to always copy the node, just because you
|
||
don't know if the original node might be shared through another tree or
|
||
not. Even if sharing ends up not being the most common case. So you do
|
||
a lot of extra work, and you end up with even more cache pressure.
|
||
|
||
Are the GC systems that do refcounting internally _and_ expose the
|
||
information upwards to the user? I bet there are. But the fact is, the
|
||
rest of them (99.9%) give those few well-done GC's a bad name.
|
||
|
||
"So what about circular data structures? Refcounting doesn't work for
|
||
them". Right. Don't do them. Or handle them very very carefully (ie
|
||
there can be a "head" that gets special handling and keeps the others
|
||
alive). Compilers certainly almost always end up working with DAG's, not
|
||
cyclic structures. Make it a rule.
|
||
|
||
Does it take more effort? Yes. The advantage of GC is that it is
|
||
automatic. But CG apologists should just admit that it causes bad
|
||
problems and often _encourages_ people to write code that performs
|
||
badly.
|
||
|
||
I really think it's the mindset that is the biggest problem. A GC
|
||
system with explicitly visible reference counts (and immediate freeing)
|
||
with language support to make it easier to get the refcounts right
|
||
(things like automatically incrementing the refcounts when passing the
|
||
object off to others) wouldn't necessarily be painful to use, and would
|
||
clearly offer all the advantages of just doing it all by hand.
|
||
|
||
That's not the world we live in, though.
|
||
|
||
Linus
|
||
|
||
|
||
|
||
* * *
|
||
* **Follow-Ups**:
|
||
* [**Re: Faster compilation speed][7]**
|
||
* _From:_ Daniel Berlin
|
||
* [**Re: Faster compilation speed][11]**
|
||
* _From:_ Robert Lipe
|
||
* **References**:
|
||
* [**Re: Faster compilation speed][9]**
|
||
* _From:_ Linus Torvalds
|
||
* [**Re: Faster compilation speed][10]**
|
||
* _From:_ Kevin Atkinson
|
||
|
||
| ----- |
|
||
| Index Nav: | [[Date Index][2]] [[Subject Index][3]] [[Author Index][4]] [[Thread Index][5]] |
|
||
| Message Nav: | [[Date Prev][6]] [[Date Next][7]] | [[Thread Prev][6]] [[Thread Next][7]] |
|
||
|
||
[1]: http://gcc.gnu.org/
|
||
[2]: http://gcc.gnu.org/index.html#00552
|
||
[3]: http://gcc.gnu.org/subjects.html#00552
|
||
[4]: http://gcc.gnu.org/authors.html#00552
|
||
[5]: http://gcc.gnu.org/threads.html#00552
|
||
[6]: http://gcc.gnu.org/msg00551.html
|
||
[7]: http://gcc.gnu.org/msg00553.html
|
||
[8]: http://gcc.gnu.org/cgi-bin/get-raw-msg?listname=gcc&date=2002-08&msgid=200208100328.g7A3SGS01429@penguin.transmeta.com
|
||
[9]: http://gcc.gnu.org/msg00544.html
|
||
[10]: http://gcc.gnu.org/msg00548.html
|
||
[11]: http://gcc.gnu.org/msg00567.html
|
||
|