Masakatsu ITO writes:
> I had always wanted to simplify the fortran codes for linear algebraic
> expressions. Consider u = M*v + w, where u,v and w are vectors and
> M is matrix. Fortran requires multiple loops or subroutine calls with
> complicated argument list. But it was said until recently,
> C++ abstraction was the enemy of performance. A naive implementation
> of overloaded operators introduces temporary and copies of objects.
>
> [...]
>
> Expression template is a way to eliminate that overhead but keep
> the advantage of OOP. Using MET, you can simply write u = m*v + w
> in your source without any temporaries or copies.
While it is true that notational convenience and the elimination of
superfluous temporaries are key benefits of expression templates, it
seems to me that perhaps there is an additional factor that
significantly impacts performance in cases like this. I think a naive
implementation of the double loop that ranges over the m*v product
term will thrash the cache if m is of any significant size. Perhaps
someone could correct me if I've got this wrong, but I am under the
impression that there exist techniques for clever reorderings of these
computational kernels that produce 3 to 10 x perfromance improvements
on scalar cache-based architectures, compared to the naive
implementation.
Do the evaluation kernels that are under the hood of your ET machinery
for matrix math take advantage of such opportunities to produce this
sort of blazing throughput?
-- Geoffrey Furnish Actel Corporation furnish@actel.com Senior Staff Engineer 955 East Arques Ave voice: 408-522-7528 Placement & Routing Sunnyvale, CA 94086-4533 fax: 408-522-8041--------------------- Object Oriented Numerics List -------------------------- * To subscribe/unsubscribe: use the handy web form at http://oonumerics.org/oon/ * If this doesn't work, please send a note to owner-oon-list@oonumerics.org
This archive was generated by hypermail 2b29 : Wed Feb 20 2002 - 03:20:15 EST