Re: OONSTD: Subscripting Operators

Roscoe A Bartlett (roscoe@andrew.cmu.edu)
Fri, 10 Jul 1998 12:28:46 -0400 (EDT)

On Thu, 9 Jul 1998, E. Robert Tisdale wrote:

>
> If you really think that multiplications are a problem, you could write:
>
> doubleHandle& h = (doubleHandle&)A.handle();
> Offset o_i = A.offset();
> for (Offset i = 0; i < m; i++) {
> Offset o_j = o_i;
> for (Offset j = 0; j < n; j++) {
> h.put(o_j, (double)(10*i + j));
> o_j += A.stride1();
> }
> o_i += A.stride2();
> }
>
> and just bump the offset along without any multiplication at all.
>
> Of course, if you know that the underlying representation for the 1D array
> is an ordinary 1D C array, you should be able to obtain a pointer
> to the beginning of that array:
>
> doubleHandle& h = (doubleHandle&)A.handle();
> double* p_i = (double*)h + A.offset();
> for (Offset i = 0; i < m; i++) {
> double* p_j = p_i;
> for (Offset j = 0; j < n; j++) {
> *p_j = (double)(10*i + j);
> p_j += A.stride1();
> }
> p_i += A.stride2();
> }
>

I have a few comments about the above code. First, when you show this to
a Fortran programmer (like my advisor) they wonder why anyone would want
to program in C++. If this is what it takes to efficiently iterate
through a dense matrix you will never be able to convince these
programmers to switch to C++ for their numerical applications. Secondly,
I have done tests that show that iterating through a array of data
with pointer arrithmetic using "p++" is up to twise as fast as using
"p += stride" even if "stride == 1".

For example, in my dense linear algebra C++ library called LinAlgPack I
have a STL compatable iterator called stride_iter that encapsulates +=
within opeator++(). Then I have a class called VectorSlice that
encapsulates a view to an array of data where the elements are seperated
by "stride" possitions (like in the BLAS). This makes it possible to allow
uniform access to:

(1) continous regions of vectors:

Vector v(1.0,10);
VectorSlice vs(v(2,5));
std::fill(vs.begin(),vs.end(),2.0);

cout << vs;

// prints:
// 10
// 1 2 2 2 2 1 1 1 1 1

(2) columns, rows and diagonals of matrices:

GenMatrix gm(1,0,5,5);
GenMatrixSlice gms( gm(2,5,2,5) );
VectorSlice
row_i( gms.row(1) ),
col_j( gms.col(1) ),
diag_k( gms.diag(0) );

row_i = 2.0; // same as std::fill(row_i.begin(),row_i.end(),2.0);
std::copy(row_i.begin(),row_i.begin(),col_j.begin());
std::copy(col_j.begin(),col_j.begin(),diag_k.begin());

cout << gm;

// prints:
// 5 5
// 1 1 1 1 1
// 1 2 2 2 2
// 1 2 2 1 1
// 1 2 1 2 1
// 1 2 1 1 2

Using these classes you could implement Mr. Tisdale's example as:

int m = 5;
int n = 5;
GenMatrix A(m,n);
for(int j = 1; j <= n; ++j) {
VectorSlice::iterator itr = A.col(j).begin();
for(int i = 1; i <= m; ++i)
*itr++ = 10 * i + j;
}

In the above code, GenMatrix is stored by column so that the inner loop
works with adjacent memory. The constructors for the VectorSlice objects
created by A.col(j) are optimized away (at least in MS VC++ 5.0) so this
code is at least as efficient as Mr. Tisdale's example and it makes no
assumptions about whether the underlying data is stored in a continous
array or any other kind of data structure you can imagine. After all,
this is what the STL showed use we could do, right? However, as I stated
before, there is a heavy price to pay for using += instead of ++ at least
on the system that I am using. This type of design however works fine
when linear algebra operations are implemented with the BLAS. Most of the
dense computational work done in my application is with level 1, 2 and 3
BLAS and there is little penalty over working with raw data in Fortran.

One last crack at the original example. Since the matrix is stored by
column you can despence with the stride_iter and += pointer arrithmetic
and just go with:

int m = 5;
int n = 5;
GenMatrix A(m,n);
for(int j = 1; j <= n; ++j) {
GenMatrix::value_type *itr = A.col_ptr(j);
for(int i = 1; i <= m; ++i)
*itr++ = 10 * i + j;
}

Now the above code uses ++ for pointer arithmetic and will run as fast as
any C or C++ implementation you can think of and it only requires that the
columns of the matrix be 1-D arrays but these columns need not be stored
sequentially (even though they are in my GenMatrix class).

Roscoe Bartlett