Re: OON: Re: oon-digest V1 #22

From: Kent Budge (kgbudge@valinor.sandia.gov)
Date: Thu Jul 30 1998 - 08:54:39 EST


Andrew Lumsdaine wrote:
>
> Steve Stevenson wrote:
>
> >> The major reason (for numerics) is that the bulk of general-purpose C++
> >> compilers do not perform aggressive floating point optimization. There
> >> are exceptions (PGI, KAI) but one still needs pragmas to warn off
> >> aliasing, etc.
>
> Unfortunately, "agressive" floating point optimization is not
> particularly possible. Consider the Dragon book --- which admonishes
> to *leave the floating point stuff alone.* Please read Goldberg's
> "What Every Computer Scientist Should Know About Floating Point
> Arithmetic". Folks, none of the "rules" of arithmetic you know hold in
> FP except stuff about the units and commutation. You'll mess up some
> programmers hard derived program for the sake of a wrong
>
> I don't think the first message (posted by Roldan Pozo, I believe?)
> was talking about the type of aggressive optimizations that are of
> concern here, and I don't think anyone is considering those sorts of
> dangerous practices as part of their optimization arsenal.

Certainly not. The problem with C++ number crunching isn't the crunching
itself, but the way the numbers are gathered up for crunching. In other
words, it's not an issue of how the FPU is being used, but of how the
cache and registers are being used.

>
> In response to the very original question -- which was what is it
> about C++ that precludes high performance -- the answer is not very
> much, per se. (The rest of this post is in answer to that original
> question.)
>
> C++ provides lots of nice abstraction mechanisms and it is easy for
> them to get in the way and obscure performance issues and this is what
> people usually bring up when they say C++ is not high performance. On
> the other hand, there are ways that one can use C++ abstractions to
> enable high-performance. In fact, given that C++ is more expressive
> than Fortran, it is possible (and Dan Quinlan has reported some
> preliminary work in this regard) for C++ to outperform Fortran by
> large factors (by a factor of four or eight, say) -- because the extra
> semantic content in C++ expressions can enable more sophisticated
> cache-aware loop transformations, for instance.

Much as I prefer C++ to FORTRAN, I'm surprised to hear that a C++
compiler is able to deduce more about a loop, and hence optimize it
better, than FORTRAN. I would expect that any large speedup of C++
relative to FORTRAN, for a given numerical problem, would be
attributable to the ability of the programmer to successfully code a
much more sophisticated algorithm in this much more expressive
language. Is this what you meant?

>
> (There is one abstraction in C++ and C that does per se preclude
> high-performance, namely, pointers. However, as Roldan pointed out,
> most compilers provide ways around this problem through the restrict
> keyword, or noalias pragmas and the like.)

If it matters at all. I've had a hard time detecting any performance
hits due to aliasing on my RISC workstation. Cache management is much
more important.

>
> The basic route to high performance is to make use of the
> architectural mechanisms in your microprocessor that are there for
> high-performance -- cache and pipelining in particular. The ways to
> take advantage of cache and pipelining typically manifest themselves
> in code as loop blocking, unrolling, register blocking, etc. But
> notice, these issues are all language independent. That is, the
> blocking, unrolling, and so forth that one does in Fortran to get high
> performance can also be done in C++, and with the same results on
> performance. The advantages of doing the code in C++ should be
> obvious (at least to this audience) -- one gets all the performance of
> Fortran, but all the software engineering and code reuse advantages of
> C++.

There *is* one optimization difficulty that is fairly C++-specific. If
one is using a value class, e.g., complex<double>, to build
expressions, the compiler usually pushes intermediate results back into
memory rather than holding them in a register as would be the case for
built-in types. This can result in a *big* performance hit. KAI has
solved this problem in their compiler and get FORTRAN-like performance
for computations on complex numbers that use the complex<double>
class.

The reason why this is C++-specific is that complex<double> is, of
course, just another struct so far as the back end is concerned. And
most of the back ends were designed for C. And there's never a reason
to declare a struct in C unless you _want_ its elements to occupy a
contiguous region of memory. And registers aren't part of a contiguous
region of memory. To put it another way, a struct is never an
intermediate result in a C program -- only in C++.

>
> We have implemented a package (MTL) that demonstrates exactly these
> points, and it is (finally almost) ready for release. For an early
> look, see
>
> http://www.lsc.nd.edu/research/mtl/
>
> Note that we are just in the process of releasing this package -- it
> is in its alpha version right now -- normal caveats apply.
>
> Regards,
> Andrew Lumsdaine
>
> ------------------------------------------------------------------------
> Andrew Lumsdaine
> Associate Professor email: lums@lsc.nd.edu
> Dept. Comp. Sci. & Engr. phone: (219) 631-8716
> 353 Fitzpatrick Hall fax: (219) 631-9260
> University of Notre Dame www: http://www.cse.nd.edu/~lums/
> Notre Dame, IN 46556
> ------------------------------------------------------------------------
>

Kent G. Budge
Sandia National Laboratories



This archive was generated by hypermail 2b29 : Wed Feb 20 2002 - 03:20:06 EST