From: Geoffrey Furnish <furnish@actel.com>
> While it is true that notational convenience and the elimination of
> superfluous temporaries are key benefits of expression templates, it
> seems to me that perhaps there is an additional factor that
> significantly impacts performance in cases like this. I think a naive
> implementation of the double loop that ranges over the m*v product
> term will thrash the cache if m is of any significant size. Perhaps
> someone could correct me if I've got this wrong, but I am under the
> impression that there exist techniques for clever reorderings of these
> computational kernels that produce 3 to 10 x perfromance improvements
> on scalar cache-based architectures, compared to the naive
> implementation.
> Do the evaluation kernels that are under the hood of your ET machinery
> for matrix math take advantage of such opportunities to produce this
> sort of blazing throughput?
I think if taking advantage of reorderings of loops, you will have to
give up the notational convenience of ET. MET cannot reorder the loops
for multiplication. Because expression template for matrix cannot be
used with saxpy product and it must used with dot product.(right?)
Frankly, MET with g++ cannot rival fortran so far. I suppose
complexity of closure objects generated by expression templates
prevents the optimization and smarter compiler is necessary to deal
with matrix math. I heard KAI C++ is good, but have no chance to
improve MET for that compiler.
Masakatsu Ito
--------------------- Object Oriented Numerics List --------------------------
* To subscribe/unsubscribe: use the handy web form at
http://oonumerics.org/oon/
* If this doesn't work, please send a note to owner-oon-list@oonumerics.org
This archive was generated by hypermail 2b29 : Wed Feb 20 2002 - 03:20:15 EST