|
|
[ HOME | What | Lists | Download | Docs | Support | Benchmarks | Contribute | Platforms | Examples | Legal | Tools | Papers/talks | Thanks ] |
| Benchmarks for the Origin 2000/SGI C++ |
These benchmarks were measured on convex.engr.sgi.com, an 8 cpu (R10000) Origin 2000 with a 32 kb L1 data cache, 4 Mb L2 unified cache, and 512 Mb main memory. Clock speed was 195 MHz. Results are only for 1 cpu. Version 7.3 of the SGI compilers was used, with -Ofast -64.
| Summary |
| Platform | Compiler | Out-of-cache | In-cache |
|---|---|---|---|
| Origin 2000 | SGI C++ 7.3 | 88.1% | 97.1% |
Mean performance: 89.6% peak in-cache, 88.1% out-of-cache.
This histogram shows peak performance in the L1 cache region for all loops. The horizontal axis is performance relative to Fortran 77: the value 1.0 indicates equal performance to Fortran, > 1 is faster and < 1 is slower.

Histogram of results for all loops, performance for R-infinity (out of cache):

| Detailed loop results |


Both compilers do the strength reduction y/u => y*u' with u' = 1/u.

Note how the fortran versions start at 50 Mflops for very small vectors.





Why is R-infinity so much worse than the fortran versions?




The lack of loop fusion really hurts the C++ versions.









loop36: The fortran 77 compile turns the loop into a call
to vexp, presumably a hand-coded vector exp routine.
|
blitz-support@oonumerics.org Tue Jul 22 08:46:24 EST 2003 | [ HOME | What | Lists | Download | Docs | Support | Benchmarks | Contribute | Platforms | Examples | Legal | Tools | Papers/talks | Thanks ] |