Blitz++ Benchmarks: Acoustic3D
This benchmark solves the 3-D acoustic wave equation using a finite
difference approximation. The grid size is 112x112x112.
Click to see an MPEG [65kb]
The benchmark problem models the propagation of pressure waves through
a cube. Each half of the cube is made of a different material, and
each half has a hollow cavity (the cavity on the left doesn't show up
very well). This image shows the pressure distribution in a two-dimensional
slice over time. The ratio of time steps to frames is 25:1 (in the MPEG,
it's 5:1).
Timing results
These timings were taken on a 100 MHz IBM RS/6000 Model 43P.
| Version | Run time (210 time steps) | Estimated Mflops/s |
Source statements |
|
Fortran 90
|
293.4 s |
10.0 |
12 |
|
Fortran 77
| 168.2 s |
17.4 |
18 |
|
Blitz++
|
200.8 s |
14.6 |
9 |
Fortran 90
(Tuned)
|
140.4 s |
20.9 |
21 |
Fortran 77
(Tuned)
|
119.7 s |
24.5 |
27 |
Blitz++
(Tuned)
|
123.1 s |
23.8 |
8 |
Compilers used: XL Fortran 77 at -O3, XL Fortran 90 at -O3, KAI C++ at +K3 -O3.
The Mflops rates are quite low because each iteration requires
21.4 Mb of data to be shipped through the CPU; this is far larger than
the cache size. It's theoretically possible to get better performance
through an iteration-space tiling, but this is beyond the scope of
Blitz++ for now.
Optimizations
This description assumes you're familiar with the optimizations in the
two-dimensional version.
- Interlacing: It turns out that for this problem interlacing
the arrays worsens performance, so this optimization wasn't used.
- Avoiding array copies: A lot of time in the untuned versions
is wasted copying arrays
to advance a time step. In the tuned versions, copying is replaced
by either cycling the arrays (Blitz++) or by putting the stencil
operation in a subroutine and manually cycling the array arguments (Fortran).
A side effect of this optimization is that the number of time steps
has to be divisible by three.
-
Cache reuse: The tuned Fortran 77 version uses 8x8x8 tiles to
achieve a higher L1 cache hit rate. Instead of tiling, the Blitz++
version uses a traversal based on the Hilbert space-filling curve.
The library automatically detects the 3D stencil expression and
generates the Hilbert-based traversal. Both approaches (tiling and
the Hilbert curve) have the effect of increasing the L1 cache hit
rate. The 8x8x8 tiles appear to be better in this situation.
Initial conditions setup
This code is only executed once, and contributes very minimally to the
performance of the benchmark. The source illustrates that F90 is much
more expressive than F77, but that Blitz++ has a slight edge over F90.
| Version |
Source statements |
|
Fortran 90
|
28
|
|
Fortran 77
|
37
|
|
Blitz++
|
21
|
Feedback is welcome.
Back to the Blitz++ page.