hn-classics/_stories/2010/5506749.md

6.3 KiB

created_at title url author points story_text comment_text num_comments story_id story_title story_url parent_id created_at_i _tags objectID year
2013-04-07T10:39:18.000Z Will It Optimize? See how well you can anticipate gcc's optimizer (2010) http://ridiculousfish.com/blog/posts/will-it-optimize.html alexbowe 94 32 1365331158
story
author_alexbowe
story_5506749
5506749 2010

Source

Will It Optimize?

hand fishtank fishtank fishtank fishtank

recent posts | all posts | about fish

Will It Optimize?

July 23rd, 2010

See how well you know (or can anticipate) gcc's optimizer. For each question, the left box contains some code, while the right box contains code that purports to do the same thing, but that illustrates a particular optimization. Will gcc apply that optimization? Put another way, will the code on the left be as fast as the code on the right, when compiled with an optimizing gcc?

I used a pretty ancient gcc 4.2.1 for these tests. If newer versions have different behavior, please leave a comment.

Beware: not all proposed optimizations are actually valid!

1. Recursion elimination

Can GCC replace recursive functions with a loop?

int factorial(int x) {
   if (x > 1) return x * factorial(x-1);
   else return 1;
}


int factorial(int x) {
   int result = 1;
   while (x > 1) result *= x--;
   return result;
}

I think GCC does this I think GCC cannot do this I say this optimization is incorrect!


2. Loop-invariant strlen()

Will GCC hoist out strlen()?

unsigned sum(const unsigned char *s) {
   unsigned result = 0;
   for (size_t i=0; i < strlen(s); i++) {
      result += s[i];
   }
   return result;
}


unsigned sum(const unsigned char *s) {
   unsigned result = 0;
   size_t length = strlen(s);
   for (size_t i=0; i < length; i++) {
      result += s[i];
   }
   return result;
}

I think GCC does this I think GCC cannot do this I say this optimization is incorrect!


3. Multiplication by 2 to addition - integer

Will GCC transform an integer multiplication by 2 to addition?

int double_it(int x) {
   return x * 2;
}


int double_it(int x) {
   return x + x;
}

I think GCC does this I think GCC cannot do this I say this optimization is incorrect!


4. Multiplication by 2 to addition - floating point

Will GCC transform a floating point multiplication by 2 to addition?

float double_it(float x) {
   return x * 2.0f;
}


float double_it(float x) {
   return x + x;
}

I think GCC does this I think GCC cannot do this I say this optimization is incorrect!


5. Division by 2 to right shift

Will GCC transform an integer division by 2 to a right shift?

int halve_it(int x) {
   return x / 2;
}


int halve_it(int x) {
   return x >> 1;
}

I think GCC does this I think GCC cannot do this I say this optimization is incorrect!


6. If-else chains to switch statements

Will GCC apply the same optimizations to if-else chains as it does to switch statements?

void function(int x) {
   if (x == 0) f0();
   else if (x == 1) f1();
   else if (x == 2) f2();
   else if (x == 3) f3();
   else if (x == 4) f4();
   else if (x == 5) f5();
}


void function(int x) {
   switch (x) {
      case 0: f0(); break;
      case 1: f1(); break;
      case 2: f2(); break;
      case 3: f3(); break;
      case 4: f4(); break;
      case 5: f5(); break;
   }
}

I think GCC does this I think GCC cannot do this I say this optimization is incorrect!


Summing up

It is tempting to think of compiler optimizations as reducing the constant in your program's big-O complexity, and nothing else. They aren't supposed to be able to make your program asymptotically faster, or affect its output.

However, as we saw, they really can reduce the asymptotic complexity in space (question 1) and time (question 2). They can also affect calculated results (discussion of question 4) and maybe even whether your program goes into an infinite loop (see here).

On the flip side, several "obvious" optimizations are subtly incorrect and so will not be performed by the compiler, especially when they involve floating point. If your floating point code is demonstrably a bottleneck and you don't need exact precision or care about special FP values, you may be able to realize a speedup by doing some optimizations manually. However, untying the compiler's hands through options like -ffast-math is probably a better idea, and then only for the affected files, since these flags have a global impact.

And lastly, this isn't meant to be a prescriptive post, but we all know why micro-optimizing is usually a mistake: it wastes your time, it's easy to screw up (see question 5), and it typically produces no measurable speedup.

Code smart, and be safe out there!