This benchmark solves the 2-D acoustic wave equation using a finite difference approximation. The grid size is 650x650.
![]() |
![]() |
![]() |
Snapshots from the simulation: 1. A pressure pulse emerges from a rectangular channel; 2. The pulse meets a bar of dense material; 3. The pulse travels faster through the dense bar.
| Version | Run time (480 time steps) | Estimated Mflops/s | Source statements |
| Fortran 90 | 117.1 s | 15.5 | 12 |
| Fortran 77 | 110.6 s | 16.4 | 21 |
| Blitz++ | 119.3 s | 15.2 | 9 |
|
C++ math library without expression templates |
431.2 s | 2.1 | 9 |
|
Fortran 90 (tuned) |
59.9 s | 30.3 | 19 |
|
Fortran 77 (tuned) |
56.1 s | 32.4 | 39 |
|
Blitz++
(tuned) |
64.8 s | 28.0 | 8 |
|
C++ math library without expression templates (tuned) |
373.9 s | 2.4 | 11 |
The Mflops rates are low because each iteration requires 6.5 Mb of data to be shipped through the CPU; this is far larger than the cache size. It's theoretically possible to get better performance through an iteration-space tiling, but this is beyond the scope of Blitz++ for now. Note that a conventional C++ array class library which used pairwise evaluation of expressions would require 39 Mb of data to go through the CPU each iteration; it would run roughly 6 times slower than the Fortran and Blitz++ versions!
The tuned versions do two important optimizations:
cycleArrays(..)
method). In Fortran, it's not possible to do this. You can get
a similar effect by putting the stencilling code in a subroutine
and doing three iterations at a time (see the source code for
details).The tuned Blitz++ and Fortran 77 versions do exactly the same optimizations. The tuned Fortran 90 version does as much optimization as possible while staying at the same level of abstraction as Blitz++. If you compare the tuned and untuned versions, you can see that the Blitz++ tuned version is a lot cleaner. The tuned Fortran versions lose their readability somewhat -- cycling arrays in Fortran isn't pretty.
XL Fortran is a state-of-the-art compiler. It's able to do many optimizations which KAI C++ doesn't, such as high-order loop transformations (interchanging, fusion, etc.), relaxation of IEEE arithmetic rules, profile-directed optimizations, inter-procedural optimization, and optimizing for specific cache structures. Blitz++ is able to compete with it by doing similar optimizations at the language level: it uses template techniques to generate optimized code.
| Version | Source statements |
|---|---|
| Fortran 90 | 27 |
| Fortran 77 | 33 |
| Blitz++ | 20 |