206 lines
9.2 KiB
Markdown
206 lines
9.2 KiB
Markdown
|
[Source](https://gcc.gnu.org/ml/gcc/2001-07/msg02150.html "Permalink to Linus Torvalds - Re: What is acceptable for -ffast-math? (Was: associative law incombine)")
|
|||
|
|
|||
|
# Linus Torvalds - Re: What is acceptable for -ffast-math? (Was: associative law incombine)
|
|||
|
|
|||
|
This is the mail archive of the `gcc@gcc.gnu.org` mailing list for the [GCC project][1].
|
|||
|
|
|||
|
* * *
|
|||
|
|
|||
|
| ----- |
|
|||
|
| Index Nav: | [[Date Index][2]] [[Subject Index][3]] [[Author Index][4]] [[Thread Index][5]] |
|
|||
|
| Message Nav: | [[Date Prev][6]] [[Date Next][7]] | [[Thread Prev][6]] [[Thread Next][8]] |
|
|||
|
|
|||
|
# Re: What is acceptable for -ffast-math? (Was: associative law incombine)
|
|||
|
|
|||
|
* _To_: <dewar at gnat dot com>
|
|||
|
* _Subject_: Re: What is acceptable for -ffast-math? (Was: associative law incombine)
|
|||
|
* _From_: Linus Torvalds <torvalds at transmeta dot com>
|
|||
|
* _Date_: Tue, 31 Jul 2001 15:50:28 -0700 (PDT)
|
|||
|
* _cc_: <gdr at codesourcery dot com>, <fjh at cs dot mu dot oz dot au>, <gcc at gcc dot gnu dot org>, <moshier at moshier dot ne dot mediaone dot net>, <tprince at computer dot org>
|
|||
|
* * *
|
|||
|
|
|||
|
|
|||
|
|
|||
|
On Tue, 31 Jul 2001 dewar@gnat.com wrote:
|
|||
|
>
|
|||
|
> Well it sure would be nice to here from some of these mythical numerical
|
|||
|
> programmers (I don't care if they are writing games or nuclear reactor codes)
|
|||
|
> who would be happier, so far we haven't heard this! And in my experience,
|
|||
|
> even quite inexperienced floating-point numerical programmers are very
|
|||
|
> disturbed when optimization changes the results of their programs.
|
|||
|
|
|||
|
I used -ffast-math myself, when I worked on the quake3 port to Linux (it's
|
|||
|
been five years, how time flies).
|
|||
|
|
|||
|
It didn't make much difference at that point, because the x86 had
|
|||
|
hand-written assembly, and gcc for the alpha didn't do much (anything?)
|
|||
|
with -ffast-math.
|
|||
|
|
|||
|
But I tried _everything_. The main FP work I did on that thing on the
|
|||
|
alpha improved the framerate by about 50% on alpha - FP was _that_
|
|||
|
critical for it. Most of it was by looking at gcc output and trying to
|
|||
|
re-organize the C code to make it be better (because gcc didn't do much on
|
|||
|
its own).
|
|||
|
|
|||
|
And yes, it was exactly things like multiplying by reciprocals.
|
|||
|
|
|||
|
> > Your arguments about "numerical computation" are just silly, as you don't
|
|||
|
> > seem to realize that there are tons of problems where your theoretical
|
|||
|
> > issues are nothing more than noise.
|
|||
|
>
|
|||
|
> If you think the arguments are silly, then I really fear you lack the full
|
|||
|
> context for this discussion, a discussion that has, as you should know raged
|
|||
|
> for well over thirty years.
|
|||
|
|
|||
|
Most of the kind of "numercial" work that you seem to be talking about has
|
|||
|
probably rather little to do with FP performance. Most of the traditional
|
|||
|
heavy FP code tends to be _much_ more about cache layout and good memory
|
|||
|
access patterns.
|
|||
|
|
|||
|
I'm personally aquainted with tryign to make a game engine go fast, where
|
|||
|
the memory effects are fewer, and the FP itself is the bottleneck.
|
|||
|
|
|||
|
> Sure -ffast-math is precisely intended to allow transformations that would
|
|||
|
> not otherwise be allowed (let's not call them optimizations, that's just
|
|||
|
> too controversial a word in the context of this argument).
|
|||
|
|
|||
|
Why not call them optimizations? They are. The only thing we change is the
|
|||
|
boundary of valid ranges.
|
|||
|
|
|||
|
> The question is what is the boundary of allowable transformations. No one
|
|||
|
> agrees that there should be no boundaries (even you don't like the change
|
|||
|
> results to zero, though note that abandoning denormals has exactly this
|
|||
|
> effect, and might be considered acceptable).
|
|||
|
|
|||
|
Oh, round-to-zero is definitely acceptable in the world of "who cares
|
|||
|
about IEEE, we want fast math, and we'll use fixed arithmetic if the FP
|
|||
|
code is too slow".
|
|||
|
|
|||
|
In fact, it is _so_ acceptable that CPU designers design for it. Look at
|
|||
|
MMX2, and wonder why they have a RTZ mode? Because it makes the _hardware_
|
|||
|
go faster.
|
|||
|
|
|||
|
That should tell you something. Big companies that have billion-dollar
|
|||
|
fabs spend time optimizing their chips that take several years to design
|
|||
|
for _games_. Not for IEEE traditional Fortran-kind math.
|
|||
|
|
|||
|
But apparently some gcc developers don't think that is even a worthy
|
|||
|
market, because you just want to do fluid dynamics.
|
|||
|
|
|||
|
> So, what is the boundary, can one for instance forget about denormals and
|
|||
|
> flush to zero to save a bit of time, can one truncate instead of round,
|
|||
|
> can one ignore negative zeroes, or infinity semantics, can one ignore
|
|||
|
> intermediate overflow (note: nearly all the discussed transformations are
|
|||
|
> implicitly saying yes to this last question).
|
|||
|
|
|||
|
Do them all by default with -ffast-math.
|
|||
|
|
|||
|
Then, you can have specific flags for people who want just _one_
|
|||
|
optimization. I doubt you'll find many users who do that, but maybe I'm
|
|||
|
wrong. Giving people the choice is always a good idea.
|
|||
|
|
|||
|
> I have not seen anyone writing from the point of view of serious numerical
|
|||
|
> coding saying [ .. ]
|
|||
|
|
|||
|
There you go again. What the hell do you call "serious numerical coding"?
|
|||
|
|
|||
|
Take a look at the computer game market today. It's a lot more serious
|
|||
|
than most matematicians puttering around in their labs, let me tell you.
|
|||
|
That's a BIG industry.
|
|||
|
|
|||
|
Also note that _nobody_ in your kind of "serious numerical coding"
|
|||
|
community would ever worry about "-ffast-math" in the first place. Why the
|
|||
|
hell would they, when 99% of the time it doesn't make any difference at
|
|||
|
all. The people you apparently consider serious are a lot more interested
|
|||
|
in fast communication (so that they can solve the thing in parallell) and
|
|||
|
incredible memory bandwidth.
|
|||
|
|
|||
|
I doubt you'll find many of your "serious numerical coding" people who
|
|||
|
would even _notice_ the raw FP throughput. Look at SpecFP - CPU's are fast
|
|||
|
enough, it spends most of its time waiting on memory.
|
|||
|
|
|||
|
> Should -ffast-math allow full precision operation? I would think so,
|
|||
|
> since it definitely improves performance, and reduces surprises.
|
|||
|
|
|||
|
Ehh.. gcc right now allows full precision operation BY DEFAULT. No
|
|||
|
-ffast-math required.
|
|||
|
|
|||
|
Same goes for negative zero as far as I remember - on some HP-PA stuff at
|
|||
|
least. Simply because it was too painful to get "right" on their earlier
|
|||
|
hardware.
|
|||
|
|
|||
|
In short, what you seem to argue that -ffast-math should means are all
|
|||
|
things gcc _already_ does, with no
|
|||
|
|
|||
|
> By the way, I said I would be shocked to find a Fortran compiler that did
|
|||
|
> associative redistribution in the absence of parens. I am somewhat surprised
|
|||
|
> that no one stepped forward with a counter-example, but I suspect in fact that
|
|||
|
> there may not be any shocking Fortran implementations around.
|
|||
|
|
|||
|
I would suspect that very few people have Fortran compilers around and
|
|||
|
bother to check it.
|
|||
|
|
|||
|
> It is an old argument, the one that says that fpt is approximate, so why bother
|
|||
|
> to be persnickety about it. Seymour Cray always tool this viewpoint, and it
|
|||
|
> did not bother him that 81.0/3.0 did not give exactly 27.0 on the CDC 6000
|
|||
|
> class machines.
|
|||
|
|
|||
|
..and he was universally respected for making the fastest machines around.
|
|||
|
|
|||
|
What you forget to mention is that these days it's so _cheap_ to get
|
|||
|
IEEE, that from a hardware standpoint pretty much everybody includes it
|
|||
|
anyway.
|
|||
|
|
|||
|
But they then often throw it away because it ends up having expensive
|
|||
|
run-time issues (alpha with exception handling and proper denormals, Intel
|
|||
|
with special RTZ modes etc).
|
|||
|
|
|||
|
Why? Because in many areas Seymour Cray is _still_ right. The thing that
|
|||
|
killed off non-IEEE was not that he was wrong, but the fact that _some_
|
|||
|
people do need IEEE "exact FP". Not everybody. Not even the majority. But
|
|||
|
because some people do need it, you need to support it. Which is why
|
|||
|
everybody does, today.
|
|||
|
|
|||
|
But do you see the difference between
|
|||
|
|
|||
|
"We have to support it because a portion of the user base has to have
|
|||
|
it, and if we don't have it we won't be able to sell to any of that
|
|||
|
user base"
|
|||
|
|
|||
|
and
|
|||
|
|
|||
|
"Everybody must use it, because anything else is wrong"
|
|||
|
|
|||
|
Eh?
|
|||
|
|
|||
|
Do you see that Seymours approach didn't fail because he was always wrong?
|
|||
|
It failed because he was _sometimes_ wrong.
|
|||
|
|
|||
|
And you know what? He was right enough of the time to have built up an
|
|||
|
empire for a while. That's something not everybody can say about
|
|||
|
themselves. And that is something that you should respect.
|
|||
|
|
|||
|
Linus
|
|||
|
|
|||
|
|
|||
|
|
|||
|
* * *
|
|||
|
* **References**:
|
|||
|
* [**Re: What is acceptable for -ffast-math? (Was: associative law in combine)][9]**
|
|||
|
* _From:_ dewar
|
|||
|
|
|||
|
| ----- |
|
|||
|
| Index Nav: | [[Date Index][2]] [[Subject Index][3]] [[Author Index][4]] [[Thread Index][5]] |
|
|||
|
| Message Nav: | [[Date Prev][6]] [[Date Next][7]] | [[Thread Prev][6]] [[Thread Next][8]] |
|
|||
|
|
|||
|
[1]: https://gcc.gnu.org/
|
|||
|
[2]: https://gcc.gnu.org/index.html#02150
|
|||
|
[3]: https://gcc.gnu.org/subjects.html#02150
|
|||
|
[4]: https://gcc.gnu.org/authors.html#02150
|
|||
|
[5]: https://gcc.gnu.org/threads.html#02150
|
|||
|
[6]: https://gcc.gnu.org/msg02149.html
|
|||
|
[7]: https://gcc.gnu.org/msg02151.html
|
|||
|
[8]: https://gcc.gnu.org/msg02107.html
|
|||
|
[9]: https://gcc.gnu.org/msg02106.html
|
|||
|
|