Blitz++ Benchmarks: Acoustic

This benchmark solves the 2-D acoustic wave equation using a finite difference approximation. The grid size is 650x650.


Snapshots from the simulation: 1. A pressure pulse emerges from a rectangular channel; 2. The pulse meets a bar of dense material; 3. The pulse travels faster through the dense bar.

Timing results

These timings were taken on a 100 MHz IBM RS/6000 Model 43P.

Version Run time
(480 time steps)
Estimated Mflops/s Source statements
Fortran 90 117.1 s 15.5 12
Fortran 77 110.6 s 16.4 21
Blitz++ 119.3 s 15.2 9
C++ math library without
expression templates
431.2 s 2.1 9
Fortran 90
(tuned)
59.9 s 30.3 19
Fortran 77
(tuned)
56.1 s 32.4 39
Blitz++
(tuned)
64.8 s 28.0 8
C++ math library without
expression templates (tuned)
373.9 s 2.4 11

Compilers used: XL Fortran 77 at -O3 -qhot, XL Fortran 90 at -O3 -qhot, KAI C++ at +K3 -O3.

The Mflops rates are low because each iteration requires 6.5 Mb of data to be shipped through the CPU; this is far larger than the cache size. It's theoretically possible to get better performance through an iteration-space tiling, but this is beyond the scope of Blitz++ for now. Note that a conventional C++ array class library which used pairwise evaluation of expressions would require 39 Mb of data to go through the CPU each iteration; it would run roughly 6 times slower than the Fortran and Blitz++ versions!

Optimizations

The tuned versions do two important optimizations:

The tuned Blitz++ and Fortran 77 versions do exactly the same optimizations. The tuned Fortran 90 version does as much optimization as possible while staying at the same level of abstraction as Blitz++. If you compare the tuned and untuned versions, you can see that the Blitz++ tuned version is a lot cleaner. The tuned Fortran versions lose their readability somewhat -- cycling arrays in Fortran isn't pretty.

XL Fortran is a state-of-the-art compiler. It's able to do many optimizations which KAI C++ doesn't, such as high-order loop transformations (interchanging, fusion, etc.), relaxation of IEEE arithmetic rules, profile-directed optimizations, inter-procedural optimization, and optimizing for specific cache structures. Blitz++ is able to compete with it by doing similar optimizations at the language level: it uses template techniques to generate optimized code.

Initial conditions setup

This code is only executed once, and contributes very minimally to the performance of the benchmark. The source illustrates that F90 is much more expressive than F77, but that Blitz++ has a slight edge over F90.

Version Source statements
Fortran 90 27
Fortran 77 33
Blitz++ 20



Feedback is welcome.

Back to the Blitz++ page.

See also the 3D Version of this benchmark.