Intel has just released their VTune 4.0 C++ compiler into beta testing.
Using this compiler I was able to compile Blitz essentially without error.
This compiler is used as a plug-in replacement for the Visual C++ 6.0
compiler and requires that environment (the Intel C++ compiler provides no
header files or libraries of its own).
The results here are very preliminary and DO NOT necessarily reflect what
the final Intel 4.0 C++ compiler will be able to achieve. I am posting
these results merely to show that Intel is making great progress.
Test machine: 200MHz PPro with 256KB L2 unified / 8KB L1 d-cache, 60nS EDO
memory
Release build, no specialized optimizations selected
Theoretical peak rate: 200 MFLOPS
I noticed another post describe the peak rate as 400 MFLOPS for an
equivalent configuration. This is not accurate.
The PPro does have separate add/mul units, but it only provides one issue
port to serve both units. Furthermore, muls can not be issued on
consecutive cycles; the peak rate can only be achieved if addition
represents at least 50% of the mix.
Looking to the future, the Intel Katmai processor, due in 2Q99, will provide
4-way SIMD for the single precision floating point format and will be
introduced at around 500MHz. The peak rate for a Katmai processor will be
on the order of 2 GFLOPS (assuming a single dispatch path).
It will be interesting to see what is required to adapt Blitz to make use of
the KNI facilities.
In-cache:
Mflops/s Description
61.035 for, indirection, unit stride
61.035 for, indirection, unit stride, no +=
82.748 for, indirection, unit stride, backwards loops
90.503 for, unroll=4, unit stride, constants loaded into temps
93.842 for, unroll=4, unit stride, constants loaded into temps,
4 read then 4 write
92.142 for, unroll=4, unit stride, constants loaded into temps,
no +=
88.817 for, unroll=4, unit stride, constants loaded into temps,
CSE for index offsets
92.142 for, unroll=4, unit stride, constants loaded into temps,
backwards
90.503 for, unroll=8, unit stride, constants loaded into temps
72.869 for, indirection, unit stride, constants into temps
59.512 for, indirection, non-unit stride
72.939 for, indirection, non-unit stride, constants loaded into temps
85.627 while, pointer increment, unit stride
95.726 while, pointer increment, unit stride,
constants loaded into temps
82.748 while, pointer increment, non-unit stride
108.53 while, pointer increment, unroll=4, non-unit stride,
constants loaded into temps
90.396 for, unroll=4, unit stride, constants loaded into temps,
prefetching
39.388 interlaced, for, indirection, unit stride
43.202 for, unroll=4, unit stride, interlaced,
constants loaded into temps
Out of cache:
Mflops/s Description
8.2874 for, indirection, unit stride
8.2982 for, indirection, unit stride, no +=
8.7884 for, indirection, unit stride, backwards loops
8.8714 for, unroll=4, unit stride, constants loaded into temps
8.8586 for, unroll=4, unit stride, constants loaded into temps,
4 read then 4 write
8.8648 for, unroll=4, unit stride, constants loaded into temps,
no +=
8.8714 for, unroll=4, unit stride, constants loaded into temps,
CSE for index offsets
8.8586 for, unroll=4, unit stride, constants loaded into temps,
backwards
9.1854 for, unroll=8, unit stride, constants loaded into temps
8.4711 for, indirection, unit stride, constants into temps
8.2817 for, indirection, non-unit stride
8.4478 for, indirection, non-unit stride, constants loaded into temps
8.7441 while, pointer increment, unit stride
8.7381 while, pointer increment, unit stride,
constants loaded into temps
8.7257 while, pointer increment, non-unit stride
9.0891 while, pointer increment, unroll=4, non-unit stride,
constants loaded into temps
8.7819 for, unroll=4, unit stride, constants loaded into temps,
prefetching
7.2967 interlaced, for, indirection, unit stride
7.2967 for, unroll=4, unit stride, interlaced,
constants loaded into temps
--------------------- blitz-dev list --------------------------------
* To subscribe/unsubscribe: mail to majordomo@oonumerics.org, with
"subscribe blitz-dev" or "unsubscribe blitz-dev" in the body of the message
* Blitz++ web page: http://oonumerics.org/blitz/
This archive was generated by hypermail 2b29 : Wed Feb 20 2002 - 04:30:07 EST