BZDEV: hao-he benchmark (complex numbers)

From: tveldhui (tveldhui@extreme.indiana.edu)
Date: Fri Feb 12 1999 - 08:40:45 EST


At Peter.Bienstman's suggestion I finally took a look at the hao-he
benchmark. It times the expression "B += sqr(A)" where A, B are
complex<double> arrays.

Here are the initial timings:

In-cache:
Time using array notation b += a*a: 2.78
Time using array notation b += sqr(a): 2.24
Time using low-level version: 3.8
Time using really low-level version: 0.47

Out-of-cache:
Time using array notation b += a*a: 4.17
Time using array notation b += sqr(a): 3.54
Time using low-level version: 5.65
Time using really low-level version: 1.98

So the Array version is about 4.7 times slower in-cache, and
1.8 times slower out of cache. Usually you can get away with
terrible code out-of-cache and still get decent performance,
so this 1.8 figure is a clear sign something is wrong.

Next I generated assembler code (g++ -S). Here is a snippet
of the inner loop of the "really low-level" version (corresponding
to one unrolling of the loop):

.LL59667:
        add %o5, -1, %o5
        fmuld %f2, %f2, %f8
        faddd %f2, %f2, %f10
        ldd [%o4+8], %f4
        ldd [%o3], %f12
        add %o4, 16, %o4
        ldd [%o3+8], %f6
        fmuld %f4, %f4, %f2
        fmuld %f10, %f4, %f4
        fsubd %f8, %f2, %f2
        faddd %f12, %f2, %f2
        faddd %f6, %f4, %f4
        std %f2, [%o3]
        std %f4, [%o3+8]
        add %o3, 16, %o3
        ldd [%o4], %f2
  .
  . etc.

Here is some of the inner loop code from the "b += sqr(a)" version:

        add %o4, %o0, %o5
        ldd [%o5], %o2
        mov %o5, %o4
        std %o2, [%fp-336]
        ldd [%fp-336], %f2
        ldd [%o4+8], %o0
        faddd %f2, %f2, %f8
        std %o0, [%fp-328]
        ldd [%fp-328], %f4
        mov %l0, %o0
        fmuld %f4, %f4, %f6
        add %fp, -320, %o1
        fmuld %f2, %f2, %f2
        fmuld %f8, %f4, %f8
        fsubd %f2, %f6, %f2
        std %f2, [%fp-320]
        call __doapl__H1Zd_Pt7complex1ZX01RCt7complex1ZX01_Rt7complex1ZX01, 0
        std %f8, [%fp-312]
        ld [%fp-352], %o4
  .
  .

Yikes! There's the culprit. There is a function call in the inner
loop. I couldn't find a demangler utility for g++ (anyone know if
there is one)? but it looks like some global function associated
with complex. Hunting through the library headers I find:

template <class _FLT>
inline complex<_FLT>&
__doapl (complex<_FLT>* ths, const complex<_FLT>& r)
{
  ths->re += r.re;
  ths->im += r.im;
  return *ths;
}

class complex<double>
{
public:
  complex& operator+= (const complex& r) { return __doapl (this, r); }
  ...
};

For some reason egcs is not inlining this call. A quick fix
is to change the complex header to:

  complex& operator+= (const complex& r)
  {
    re += r.re;
    im += r.im;
    return *this;
  }

Okay. Then I find the call to __doapl__... disappears, but now there
is a call to

        add %i0, 8, %o0
        call sqr__H1ZQ25blitzt5Array2Zt7complex1Zdi1__5blitzRCt6ETBase1ZX01_Q
25blitzt13_bz_ArrayExpr1ZQ25blitzt20_bz_ArrayExprUnaryOp2ZQ35blitzt6asExpr1ZX016
T_exprZQ25blitzt7_bz_sqr1ZQ45blitzt6asExpr1ZX016T_expr9T_numtype, 0
        nop
        unimp 20
        ld [%fp-148], %o0
        ld [%fp-144], %o1
        st %o0, [%fp-172]
        st %o1, [%fp-168]
        st %o0, [%fp-356]
        st %o1, [%fp-352]
        ld [%i1+20], %o0
        st %i1, [%fp-304]
        ld [%i1+28], %o1
        call .umul, 0
        mov 0, %l2
        ld [%fp-304], %o3
        sll %o0, 4, %o0
        ld [%o3+28], %o1
        ld [%fp-356], %o4
        st %o1, [%fp-292]
        ld [%o4+28], %o2
        ld [%i1+4], %o1
        st %o2, [%l3+12]
...

which is the blitz sqr routine. Also there are scattered calls to .umul.
It's like playing "whack the gopher".

I tried various things but couldn't get everything inlined.

Possible fixes:
  1. Figure out a way to tell egcs to inline everything
     (anyone know an option for this?)
  2. Use libstdc++, which has a different implementation of
     complex. I tried building this under solaris but ran
     into trouble -- maybe someone else will have better luck?
     It is available from ftp://egcs.cygnus.com/pub/libstdc++

Cheers,
Todd

--------------------- blitz-dev list --------------------------------
* To subscribe/unsubscribe: mail to majordomo@oonumerics.org, with
"subscribe blitz-dev" or "unsubscribe blitz-dev" in the body of the message
* Blitz++ web page: http://oonumerics.org/blitz/



This archive was generated by hypermail 2b29 : Wed Feb 20 2002 - 04:30:09 EST