Blitz logo

Blitz Devel :

From: Julian Cummings (cummings_at_[hidden])
Date: 2002-12-19 18:17:53


Hi Farid,

These benchmarks vary a lot depending upon the compiler version
and optimization flags and the hardware being used. Looking at your
results, I would characterize the F77 implementation as being about
4 times faster rather than an order of magnitude. Nevertheless, that
is still a significant difference. The goal in these exercises is typically
to be within a factor of 2 of the Fortran performance.

I ran the daxpy on my 860 MHz Pentium 3 box after building it with
gcc 3.0. (I omitted the F90 benchmark, since gcc doesn't support F90.)
My results show the Vector<T> implementation performing within
a factor of 2 of the F77 implementation. In comparing my results (see
below) to yours, I find that I am getting almost a factor of two better
numbers for the Vector<T> and Array<T,1> implementations, and a
somewhat lower peak performance from the F77 implementation.
Which processor and compiler version are you using? You may have
to fool around with the compiler optimization flags a bit to improve this.
(Note: I noticed a comment at the top of daxpy.cpp that the restrict
keyword is being disabled because of a problem in KCC 3.2c. This is
obsolete and can probably be removed now. That might help you.)

Regards, Julian C.

% This matlab file generated automatically by class Benchmark
% of the Blitz++ class library.

parm = [ 1.000000000000e+00 3.000000000000e+00 5.000000000000e+00
1.000000000000e+01 1.700000000000e+01 3.100000000000e+01 5.600000000000e+01
1.000000000000e+02 1.770000000000e+02 3.160000000000e+02 5.620000000000e+02
1.000000000000e+03 1.778000000000e+03 3.162000000000e+03 5.623000000000e+03
1.000000000000e+04 1.778200000000e+04 3.162200000000e+04 5.623400000000e+04 ];

Mf = [ 5.128205128205e+01 1.490312965723e+01 1.176470588235e+02
3.367003367003e+01 ;
1.156069317919e+02 3.921568470588e+01 2.150537548387e+02 7.843136941176e+01 ;
1.538461538462e+02 5.952380952381e+01 2.666666666667e+02 8.583690987124e+01 ;
1.739130434783e+02 8.620689655172e+01 3.125000000000e+02 1.069518716578e+02 ;
2.083333000000e+02 1.142856960000e+02 3.703703111111e+02 1.282051076923e+02 ;
2.352940847059e+02 1.449275159420e+02 3.571428071429e+02 1.526717343511e+02 ;
2.564102153846e+02 1.694914983051e+02 3.921568000000e+02 1.639344000000e+02 ;
2.666666666667e+02 1.851851851852e+02 4.081632653061e+02 1.724137931034e+02 ;
2.777769166667e+02 1.999993800000e+02 4.255305957447e+02 1.724132586207e+02 ;
2.777762888889e+02 2.061844618557e+02 4.255296340426e+02 1.801792144144e+02 ;
2.816870647887e+02 2.083310583333e+02 4.347778608696e+02 1.834842348624e+02 ;
2.816901408451e+02 2.083333333333e+02 4.444444444444e+02 1.834862385321e+02 ;
2.857093600000e+02 1.980163881188e+02 3.921501019608e+02 1.785683500000e+02 ;
2.857002514286e+02 1.960688000000e+02 3.921376000000e+02 1.785626571429e+02 ;
2.857126628571e+02 2.061843958763e+02 3.921546352941e+02 1.801791567568e+02 ;
2.857142857143e+02 9.009009009009e+01 1.333333333333e+02 1.754385964912e+02 ;
6.428964887460e+01 5.061792607595e+01 9.705864466019e+01 4.660624895105e+01 ;
4.586640550459e+01 4.166198500000e+01 6.134280000000e+01 4.097900163934e+01 ;
4.483589775785e+01 4.106121232033e+01 5.746209885057e+01 4.064392357724e+01 ] ;

semilogx(parm,Mf), title('DAXPY Benchmark'),
    xlabel('Vector length'), ylabel('Mflops/s')
legend('Vector<T>', 'Array<T,1>', 'Fortran 77', 'Fortran BLAS')

Farid Moussaoui wrote:

> Hi,
>
> I just downloaded the blitz library and I check the daxpy performance
> on my PC linux box. I just remark that fortran implementation is superior
> by an order of magnitude than vector<T> implementation!
>
> Hereafter the generated matlab file
>
> % This matlab file generated automatically by class Benchmark
> % of the Blitz++ class library.
>
> parm = [ 1.000000000000e+00 3.000000000000e+00 5.000000000000e+00
> 1.000000000000e+01 1.700000000000e+01 3.100000000000e+01 5.600000000000e+01
> 1.000000000000e+02 1.770000000000e+02 3.160000000000e+02 5.620000000000e+02
> 1.000000000000e+03 1.778000000000e+03 3.162000000000e+03 5.623000000000e+03
> 1.000000000000e+04 1.778200000000e+04 3.162200000000e+04 5.623400000000e+04 ];
>
> Mf = [ 4.683840749415e+01 8.576329331046e+00 1.724137931034e+02
> 3.875968992248e+01 1.652892561983e+02 ;
> 8.368200502092e+01 2.257336252822e+01 2.439024292683e+02 9.615384230769e+01
> 2.597402493506e+02 ;
> 9.661835748792e+01 3.091190108192e+01 3.846153846154e+02 1.428571428571e+02
> 3.773584905660e+02 ;
> 1.036269430052e+02 4.535147392290e+01 4.000000000000e+02 2.197802197802e+02
> 4.166666666667e+02 ;
> 1.162790511628e+02 5.571029749304e+01 4.545453818182e+02 2.985074149254e+02
> 4.545453818182e+02 ;
> 1.162790534884e+02 6.535946797386e+01 4.255318553191e+02 3.174602730159e+02
> 4.347825478261e+02 ;
> 1.324503099338e+02 7.326006153846e+01 4.651162046512e+02 4.255318468085e+02
> 4.878048000000e+02 ;
> 1.315789473684e+02 7.692307692308e+01 5.000000000000e+02 4.651162790698e+02
> 5.128205128205e+02 ;
> 1.351347162162e+02 8.097140890688e+01 5.128189230769e+02 5.128189230769e+02
> 5.128189230769e+02 ;
> 1.265816000000e+02 8.403316302521e+01 4.999973200000e+02 5.263129684211e+02
> 5.405376432432e+02 ;
> 1.324488847682e+02 8.474483728814e+01 5.405346378378e+02 5.263100421053e+02
> 5.405346378378e+02 ;
> 1.324503311258e+02 8.474576271186e+01 5.555555555556e+02 5.405405405405e+02
> 5.555555555556e+02 ;
> 1.342258738255e+02 8.474430169492e+01 4.999913800000e+02 4.545376181818e+02
> 4.877964682927e+02 ;
> 1.298637506494e+02 8.657583376623e+01 4.255110127660e+02 4.650934325581e+02
> 4.999754400000e+02 ;
> 1.388881000000e+02 8.438770632911e+01 4.878021073171e+02 4.761877714286e+02
> 4.999971600000e+02 ;
> 1.219512195122e+02 8.403361344538e+01 2.272727272727e+02 4.878048780488e+02
> 2.173913043478e+02 ;
> 4.281387751606e+01 4.628259444444e+01 9.171596697248e+01 8.925928928571e+01
> 1.004727678392e+02 ;
> 3.194529201278e+01 3.366625050505e+01 5.763041152738e+01 5.779697341040e+01
> 5.375740000000e+01 ;
> 3.119627207488e+01 3.230502487884e+01 4.949705544554e+01 5.024324221106e+01
> 4.889195696822e+01 ] ;
>
> semilogx(parm,Mf), title('DAXPY Benchmark'),
> xlabel('Vector length'), ylabel('Mflops/s')
> legend('Vector<T>', 'Array<T,1>', 'Fortran 77', 'Fortran BLAS', 'Fortran 90')
>
> ========================================================================
> Farid Moussaoui Tel : +41 21 693 3533
> Laboratory of Computational Engineering Fax : +41 21 693 3646
> Ecole Polytechnique Federale de Lausanne
> CH-1015 Lausanne http://lmnwww.epfl.ch
> Switzerland
> "Think. Then discretise." - Rokhlin
>
> -------------------------------------------------
> This mail sent through IMP: imapwww.epfl.ch
> _______________________________________________
> Blitz-dev mailing list
> Blitz-dev_at_[hidden]
> http://www.oonumerics.org/mailman/listinfo.cgi/blitz-dev

--
Dr. Julian C. Cummings                       E-mail: cummings_at_[hidden]
California Institute of Technology           Phone:  626-395-2543
1200 E. California Blvd., Mail Code 158-79   Fax:    626-584-5917
Pasadena, CA 91125