227 lines
9.6 KiB
Markdown
227 lines
9.6 KiB
Markdown
---
|
||
created_at: '2016-12-23T07:41:36.000Z'
|
||
title: 'Re: What is acceptable for -ffast-math? (2001)'
|
||
url: https://gcc.gnu.org/ml/gcc/2001-07/msg02150.html
|
||
author: willvarfar
|
||
points: 86
|
||
story_text:
|
||
comment_text:
|
||
num_comments: 37
|
||
story_id:
|
||
story_title:
|
||
story_url:
|
||
parent_id:
|
||
created_at_i: 1482478896
|
||
_tags:
|
||
- story
|
||
- author_willvarfar
|
||
- story_13243489
|
||
objectID: '13243489'
|
||
|
||
---
|
||
[Source](https://gcc.gnu.org/ml/gcc/2001-07/msg02150.html "Permalink to Linus Torvalds - Re: What is acceptable for -ffast-math? (Was: associative law incombine)")
|
||
|
||
# Linus Torvalds - Re: What is acceptable for -ffast-math? (Was: associative law incombine)
|
||
|
||
This is the mail archive of the `gcc@gcc.gnu.org` mailing list for the [GCC project][1].
|
||
|
||
* * *
|
||
|
||
| ----- |
|
||
| Index Nav: | [[Date Index][2]] [[Subject Index][3]] [[Author Index][4]] [[Thread Index][5]] |
|
||
| Message Nav: | [[Date Prev][6]] [[Date Next][7]] | [[Thread Prev][6]] [[Thread Next][8]] |
|
||
|
||
# Re: What is acceptable for -ffast-math? (Was: associative law incombine)
|
||
|
||
* _To_: <dewar at gnat dot com>
|
||
* _Subject_: Re: What is acceptable for -ffast-math? (Was: associative law incombine)
|
||
* _From_: Linus Torvalds <torvalds at transmeta dot com>
|
||
* _Date_: Tue, 31 Jul 2001 15:50:28 -0700 (PDT)
|
||
* _cc_: <gdr at codesourcery dot com>, <fjh at cs dot mu dot oz dot au>, <gcc at gcc dot gnu dot org>, <moshier at moshier dot ne dot mediaone dot net>, <tprince at computer dot org>
|
||
* * *
|
||
|
||
|
||
|
||
On Tue, 31 Jul 2001 dewar@gnat.com wrote:
|
||
>
|
||
> Well it sure would be nice to here from some of these mythical numerical
|
||
> programmers (I don't care if they are writing games or nuclear reactor codes)
|
||
> who would be happier, so far we haven't heard this! And in my experience,
|
||
> even quite inexperienced floating-point numerical programmers are very
|
||
> disturbed when optimization changes the results of their programs.
|
||
|
||
I used -ffast-math myself, when I worked on the quake3 port to Linux (it's
|
||
been five years, how time flies).
|
||
|
||
It didn't make much difference at that point, because the x86 had
|
||
hand-written assembly, and gcc for the alpha didn't do much (anything?)
|
||
with -ffast-math.
|
||
|
||
But I tried _everything_. The main FP work I did on that thing on the
|
||
alpha improved the framerate by about 50% on alpha - FP was _that_
|
||
critical for it. Most of it was by looking at gcc output and trying to
|
||
re-organize the C code to make it be better (because gcc didn't do much on
|
||
its own).
|
||
|
||
And yes, it was exactly things like multiplying by reciprocals.
|
||
|
||
> > Your arguments about "numerical computation" are just silly, as you don't
|
||
> > seem to realize that there are tons of problems where your theoretical
|
||
> > issues are nothing more than noise.
|
||
>
|
||
> If you think the arguments are silly, then I really fear you lack the full
|
||
> context for this discussion, a discussion that has, as you should know raged
|
||
> for well over thirty years.
|
||
|
||
Most of the kind of "numercial" work that you seem to be talking about has
|
||
probably rather little to do with FP performance. Most of the traditional
|
||
heavy FP code tends to be _much_ more about cache layout and good memory
|
||
access patterns.
|
||
|
||
I'm personally aquainted with tryign to make a game engine go fast, where
|
||
the memory effects are fewer, and the FP itself is the bottleneck.
|
||
|
||
> Sure -ffast-math is precisely intended to allow transformations that would
|
||
> not otherwise be allowed (let's not call them optimizations, that's just
|
||
> too controversial a word in the context of this argument).
|
||
|
||
Why not call them optimizations? They are. The only thing we change is the
|
||
boundary of valid ranges.
|
||
|
||
> The question is what is the boundary of allowable transformations. No one
|
||
> agrees that there should be no boundaries (even you don't like the change
|
||
> results to zero, though note that abandoning denormals has exactly this
|
||
> effect, and might be considered acceptable).
|
||
|
||
Oh, round-to-zero is definitely acceptable in the world of "who cares
|
||
about IEEE, we want fast math, and we'll use fixed arithmetic if the FP
|
||
code is too slow".
|
||
|
||
In fact, it is _so_ acceptable that CPU designers design for it. Look at
|
||
MMX2, and wonder why they have a RTZ mode? Because it makes the _hardware_
|
||
go faster.
|
||
|
||
That should tell you something. Big companies that have billion-dollar
|
||
fabs spend time optimizing their chips that take several years to design
|
||
for _games_. Not for IEEE traditional Fortran-kind math.
|
||
|
||
But apparently some gcc developers don't think that is even a worthy
|
||
market, because you just want to do fluid dynamics.
|
||
|
||
> So, what is the boundary, can one for instance forget about denormals and
|
||
> flush to zero to save a bit of time, can one truncate instead of round,
|
||
> can one ignore negative zeroes, or infinity semantics, can one ignore
|
||
> intermediate overflow (note: nearly all the discussed transformations are
|
||
> implicitly saying yes to this last question).
|
||
|
||
Do them all by default with -ffast-math.
|
||
|
||
Then, you can have specific flags for people who want just _one_
|
||
optimization. I doubt you'll find many users who do that, but maybe I'm
|
||
wrong. Giving people the choice is always a good idea.
|
||
|
||
> I have not seen anyone writing from the point of view of serious numerical
|
||
> coding saying [ .. ]
|
||
|
||
There you go again. What the hell do you call "serious numerical coding"?
|
||
|
||
Take a look at the computer game market today. It's a lot more serious
|
||
than most matematicians puttering around in their labs, let me tell you.
|
||
That's a BIG industry.
|
||
|
||
Also note that _nobody_ in your kind of "serious numerical coding"
|
||
community would ever worry about "-ffast-math" in the first place. Why the
|
||
hell would they, when 99% of the time it doesn't make any difference at
|
||
all. The people you apparently consider serious are a lot more interested
|
||
in fast communication (so that they can solve the thing in parallell) and
|
||
incredible memory bandwidth.
|
||
|
||
I doubt you'll find many of your "serious numerical coding" people who
|
||
would even _notice_ the raw FP throughput. Look at SpecFP - CPU's are fast
|
||
enough, it spends most of its time waiting on memory.
|
||
|
||
> Should -ffast-math allow full precision operation? I would think so,
|
||
> since it definitely improves performance, and reduces surprises.
|
||
|
||
Ehh.. gcc right now allows full precision operation BY DEFAULT. No
|
||
-ffast-math required.
|
||
|
||
Same goes for negative zero as far as I remember - on some HP-PA stuff at
|
||
least. Simply because it was too painful to get "right" on their earlier
|
||
hardware.
|
||
|
||
In short, what you seem to argue that -ffast-math should means are all
|
||
things gcc _already_ does, with no
|
||
|
||
> By the way, I said I would be shocked to find a Fortran compiler that did
|
||
> associative redistribution in the absence of parens. I am somewhat surprised
|
||
> that no one stepped forward with a counter-example, but I suspect in fact that
|
||
> there may not be any shocking Fortran implementations around.
|
||
|
||
I would suspect that very few people have Fortran compilers around and
|
||
bother to check it.
|
||
|
||
> It is an old argument, the one that says that fpt is approximate, so why bother
|
||
> to be persnickety about it. Seymour Cray always tool this viewpoint, and it
|
||
> did not bother him that 81.0/3.0 did not give exactly 27.0 on the CDC 6000
|
||
> class machines.
|
||
|
||
..and he was universally respected for making the fastest machines around.
|
||
|
||
What you forget to mention is that these days it's so _cheap_ to get
|
||
IEEE, that from a hardware standpoint pretty much everybody includes it
|
||
anyway.
|
||
|
||
But they then often throw it away because it ends up having expensive
|
||
run-time issues (alpha with exception handling and proper denormals, Intel
|
||
with special RTZ modes etc).
|
||
|
||
Why? Because in many areas Seymour Cray is _still_ right. The thing that
|
||
killed off non-IEEE was not that he was wrong, but the fact that _some_
|
||
people do need IEEE "exact FP". Not everybody. Not even the majority. But
|
||
because some people do need it, you need to support it. Which is why
|
||
everybody does, today.
|
||
|
||
But do you see the difference between
|
||
|
||
"We have to support it because a portion of the user base has to have
|
||
it, and if we don't have it we won't be able to sell to any of that
|
||
user base"
|
||
|
||
and
|
||
|
||
"Everybody must use it, because anything else is wrong"
|
||
|
||
Eh?
|
||
|
||
Do you see that Seymours approach didn't fail because he was always wrong?
|
||
It failed because he was _sometimes_ wrong.
|
||
|
||
And you know what? He was right enough of the time to have built up an
|
||
empire for a while. That's something not everybody can say about
|
||
themselves. And that is something that you should respect.
|
||
|
||
Linus
|
||
|
||
|
||
|
||
* * *
|
||
* **References**:
|
||
* [**Re: What is acceptable for -ffast-math? (Was: associative law in combine)][9]**
|
||
* _From:_ dewar
|
||
|
||
| ----- |
|
||
| Index Nav: | [[Date Index][2]] [[Subject Index][3]] [[Author Index][4]] [[Thread Index][5]] |
|
||
| Message Nav: | [[Date Prev][6]] [[Date Next][7]] | [[Thread Prev][6]] [[Thread Next][8]] |
|
||
|
||
[1]: https://gcc.gnu.org/
|
||
[2]: https://gcc.gnu.org/index.html#02150
|
||
[3]: https://gcc.gnu.org/subjects.html#02150
|
||
[4]: https://gcc.gnu.org/authors.html#02150
|
||
[5]: https://gcc.gnu.org/threads.html#02150
|
||
[6]: https://gcc.gnu.org/msg02149.html
|
||
[7]: https://gcc.gnu.org/msg02151.html
|
||
[8]: https://gcc.gnu.org/msg02107.html
|
||
[9]: https://gcc.gnu.org/msg02106.html
|
||
|