[ HOME | What | Lists | Download | Docs | Support | Benchmarks | Contribute | Platforms | Examples | Legal | Tools | Papers/talks | Thanks ]


Benchmarks for the Origin 2000/SGI C++

These benchmarks were measured on convex.engr.sgi.com, an 8 cpu (R10000) Origin 2000 with a 32 kb L1 data cache, 4 Mb L2 unified cache, and 512 Mb main memory. Clock speed was 195 MHz. Results are only for 1 cpu. Version 7.3 of the SGI compilers was used, with -Ofast -64.



Summary

Median performance

Platform Compiler Out-of-cache In-cache
Origin 2000SGI C++ 7.388.1%97.1%

Mean performance: 89.6% peak in-cache, 88.1% out-of-cache.

This histogram shows peak performance in the L1 cache region for all loops. The horizontal axis is performance relative to Fortran 77: the value 1.0 indicates equal performance to Fortran, > 1 is faster and < 1 is slower.



Histogram of results for all loops, performance for R-infinity (out of cache):





Detailed loop results




Both compilers do the strength reduction y/u => y*u' with u' = 1/u.



Note how the fortran versions start at 50 Mflops for very small vectors.











Why is R-infinity so much worse than the fortran versions?









The lack of loop fusion really hurts the C++ versions.



















loop36: The fortran 77 compile turns the loop into a call to vexp, presumably a hand-coded vector exp routine.


blitz-support@oonumerics.org
Tue Jul 22 08:46:24 EST 2003
[ HOME | What | Lists | Download | Docs | Support | Benchmarks | Contribute | Platforms | Examples | Legal | Tools | Papers/talks | Thanks ]