hn-classics/_stories/2002/2473932.md

[Source](http://gcc.gnu.org/ml/gcc/2002-08/msg00552.html "Permalink to Linus Torvalds - Re: Faster compilation speed")

# Linus Torvalds - Re: Faster compilation speed

This is the mail archive of the `gcc@gcc.gnu.org` mailing list for the [GCC project][1]. 

* * *

| ----- |
| Index Nav: |  [[Date Index][2]] [[Subject Index][3]] [[Author Index][4]] [[Thread Index][5]]  |  
| Message Nav: |  [[Date Prev][6]] [[Date Next][7]] |  [[Thread Prev][6]] [[Thread Next][7]] |  
| Other format: |  [[Raw text][8]] |   |

# Re: Faster compilation speed

* _From_: Linus Torvalds <torvalds at transmeta dot com>
* _To_: kevin at atkinson dot dhs dot org, gcc at gcc dot gnu dot org
* _Cc_: 
* _Date_: Fri, 9 Aug 2002 20:28:16 -0700
* _Subject_: Re: Faster compilation speed
* _Newsgroups_: linux.egcs
* _Organization_: 
* _References_: <[200208100156.g7A1uwn01415@penguin.transmeta.com][9]>
* * *
    
    
    In article <[Pine.LNX.4.44.0208092227500.2273-100000@kevin-pc.atkinson.dhs.org][10]> you write:
    >On Fri, 9 Aug 2002, Linus Torvalds wrote:
    >
    >> And that, in turn, is probably impossible to fix as long as gcc uses
    >> garbage collection for most of its internal memory management.  There
    >> just aren't all that many worse ways to f*ck up your cache behaviour
    >> than by using lots of allocations and lazy GC to manage your memory. 
    >
    >Excuse the interruption, but from what I read a good generational garbage 
    >collector can be just as fast as manually managing memory?
    
    All the papers I've seen on it are total jokes.  But maybe I've looked
    at the wrong ones. 
    
    One fundamental fact on modern hardware is that data cache locality is
    good, and not being in the cache sucks.  This is not likely to change. 
    In particular, this means that if you allocate stuff, you want to re-use
    the stuff you just freed _as_soon_as_possible_ - preferably before the
    previously dirty data has ever even been evicted from the cache, so that
    you can re-use the thing to avoid reading it in, but also to avoid
    writing out stale data. 
    
    This implies that any lazy de-allocation is bad. When a piece of memory
    is free, you want to de-allocate it _immediately_, so that the next
    allocation gets to re-use it and gets the cache footprint "for free".
    
    Generational garabage collectors tend to never re-use hot objects, and
    often do the copying between generations making things even worse on the
    cache.  Compaction helps subsequent use somewhat, but is in itself
    inherently costly, and the indirection (or fixup) implied by it can
    limit other optimization. 
    
    Sure, by being lazy you can sometimes win in icache footprint (and in
    instruction count - a lot of the "GC is fast" papers seem to rely on the
    fact that you can do other optimizations if you're lazy), but you lose
    big in dirty dcache footprint.  And since dcache is much more expensive
    than instructions, you're better off doing explicit memory management
    with refcounting (optionally helped by the programming language, of
    course.  You can make exact refcounting be your "GC" with some language
    support). 
    
    However, there's another, more fundamental issue.  It's the _mindset_. 
    The GC mindset tends to go hand-in-hand with pointer chasing, while
    people who use explicit allocators tend to be happier with doing things
    like "realloc()" and trying to use arrays and indexes instead of linked
    lists and just generally trying to avoid allocating lots of small
    things.  Which tends to be better on the cache. 
    
    Yes, I generalize. Don't we all?
    
    For example, if you have an _explicit_ refcounting system, then it is
    quite natural to have operations like "copy-on-write", where if you
    decide to change a tree node you do something like
    
    	copy_on_write(node_t **np)
    	{
    
    		note_t *node = *np;
    		if (node->count > 1)
    			newnode = copy_alloc(node);
    			*np = newnode;
    			node->count--;
    			node = newnode;
    		}
    		return node;
    	}
    
    and then before you change a tree node you do
    
    	node = copy_on_write(&tree->node);
    	.. we now know we are the exclusive owners of "node" ..
    
    which tends to be very efficient - it allows sharing, even if sharing is
    often not the common case (and doesn't do any extra allocations for the
    common case of an access that was already exclusively owned).
    
    (If you want to be thread-safe you need to be more careful yet, and have
    thread-safe "get_node()/put_node()" actions etc.  Most applications
    don't need to be that careful, but you'll see a _lot_ of this inside an
    operating system). 
    
    In contrast, in a GC system where you do _not_ have access to the
    explicit refcounting, you tend to always copy the node, just because you
    don't know if the original node might be shared through another tree or
    not.  Even if sharing ends up not being the most common case.  So you do
    a lot of extra work, and you end up with even more cache pressure. 
    
    Are the GC systems that do refcounting internally _and_ expose the
    information upwards to the user? I bet there are. But the fact is, the
    rest of them (99.9%) give those few well-done GC's a bad name.
    
    "So what about circular data structures? Refcounting doesn't work for
    them".  Right.  Don't do them.  Or handle them very very carefully (ie
    there can be a "head" that gets special handling and keeps the others
    alive). Compilers certainly almost always end up working with DAG's, not
    cyclic structures. Make it a rule.
    
    Does it take more effort? Yes.  The advantage of GC is that it is
    automatic.  But CG apologists should just admit that it causes bad
    problems and often _encourages_ people to write code that performs
    badly. 
    
    I really think it's the mindset that is the biggest problem.  A GC
    system with explicitly visible reference counts (and immediate freeing)
    with language support to make it easier to get the refcounts right
    (things like automatically incrementing the refcounts when passing the
    object off to others) wouldn't necessarily be painful to use, and would
    clearly offer all the advantages of just doing it all by hand. 
    
    That's not the world we live in, though.
    
    		Linus
    
    

* * *
* **Follow-Ups**: 
    * [**Re: Faster compilation speed][7]**
        * _From:_ Daniel Berlin
    * [**Re: Faster compilation speed][11]**
        * _From:_ Robert Lipe
* **References**: 
    * [**Re: Faster compilation speed][9]**
        * _From:_ Linus Torvalds
    * [**Re: Faster compilation speed][10]**
        * _From:_ Kevin Atkinson

| ----- |
| Index Nav: |  [[Date Index][2]] [[Subject Index][3]] [[Author Index][4]] [[Thread Index][5]]  |  
| Message Nav: |  [[Date Prev][6]] [[Date Next][7]] |  [[Thread Prev][6]] [[Thread Next][7]] | 

[1]: http://gcc.gnu.org/
[2]: http://gcc.gnu.org/index.html#00552
[3]: http://gcc.gnu.org/subjects.html#00552
[4]: http://gcc.gnu.org/authors.html#00552
[5]: http://gcc.gnu.org/threads.html#00552
[6]: http://gcc.gnu.org/msg00551.html
[7]: http://gcc.gnu.org/msg00553.html
[8]: http://gcc.gnu.org/cgi-bin/get-raw-msg?listname=gcc&date=2002-08&msgid=200208100328.g7A3SGS01429@penguin.transmeta.com
[9]: http://gcc.gnu.org/msg00544.html
[10]: http://gcc.gnu.org/msg00548.html
[11]: http://gcc.gnu.org/msg00567.html