Re: BZDEV: Re: Initial thoughts about tensors

From: Walter Landry (landry@spacenet.tn.cornell.edu)
Date: Fri Jul 03 1998 - 16:37:19 EST


Hello,

I have some interest in the development of tensors for Blitz. I just
completed my own Tensor manipulation package before I chanced upon the
Blitz site. Oh well. I was interested in making it as efficient as
possible, so my implementation differs somewhat from Boris'. I am
particularly interested in incorporating Blitz into it (or it into
Blitz) since I now use pairwise evaluation with some deferred
evaluation using proxy return types. Needless to say, it is incredibly
slow and uses lots of temporaries. In any case, since, like Boris,
I've actually gone through all of the difficulty of implementing a
Tensor package, I thought I would give my take on the questions that
Todd V. posed.

        1. What is a reasonable upper bound on rank for a tensor?

I don't think that there should be a hard-wired maximum. Blitz should
provide everything needed up to a certain rank (say 4 or 6), and not
make it too hard to expand it. In my application, I made everything
arrays of arrays, so that a rank 2 tensor was an array of rank 1
tensors (It is actually a little more complicated than that, since I
have to be able to index subarrays, but you get the idea).

        2. Many kinds of tensors

        There many kinds of tensors; each kind needs different
        representation, indexing scheme, and algebraic semantics.
        Some examples:

        Dense-- your vanilla asymmetric, dense tensor.
        Fully symmetric: e.g. A_ijk = A_jki = A_jik. Indexing such
        tensors is tricky and involves a lot of combinatorics.
        Partly symmetric: the elements are invariant under some permutation
        group defined on the indices.
        Identity
        Kronecker delta
        Permutation tensor

        Then there are lots of application-specific tensors which have
        various oddities (e.g. antisymmetries).

        How to handle all these different kinds of tensors? As a single
        class with an "engine" template parameter? As a family of classes?

        It should be easy for users to create their own tensor
        classes, because it's hard to predict all possible varieties
        people might need.

I'm a bit unclear on what you mean by an "engine", but in my
application, I created a new class for every type of symmetry. I
really don't see a way around it. You really have to redefine all of
the arithmetic operators to take advantage of the symmetries. It also
removes redundant or identically zero entries. In my application, I
defined my tensors on 3D grids. Therefore, removing extra operations
between tensors was key.

        3. Should tensor indices be associated with a particular tensor?

        In Boris's library, a tensor object remembers its tensor indices.
        For example,

        tensor I_ijkl = I2("ij")*I2("kl")

        The tensor I_ijkl knows that its indices are i,j,k and l. This
        is not the case in blitz. Is this important?

I think that the Blitz way is the better way. I made a syntax like

  tensor4 I(i,j,k,l) = I2(i,j)*I2(k,l)

because I might change my mind and want to write it

  tensor4 I(j,i,k,l) = I2(i,j)*I2(k,l)

In Boris' case, it is not completely clear which indices are related
to which "slot".

        4. How to handle contractions

        Boris's library does implicit contractions, e.g.
        tensor r = A("ik")*B("jk")

        results in r_ij = sum over k of A_ik*B_jk

        In Blitz contractions are done explicitly, e.g.
        Array<float,2> r = sum(A(i,k) * B(j,k), k);

        If you're doing lots of tensor notation, Boris's is more compact.
        The Blitz style is arguably more flexible, since you can substitute
        other operations for "sum".

I think that if you don't do implicit contractions, then there is
almost no point in doing tensors. Without implicit contractions, they
just become n-dimensional arrays. Presumably, we want to able to use
these tensors in arbitrarily complicated expressions. If we add
things like sum(...), then the expression won't look like the formula
on the paper anymore. We should be aiming toward making the
conversion from paper to computer as trouble-free as possible, without
sacrificing speed. That seems like the whole point of Blitz.

With all of that said, my tensor package didn't do contractions
completely implicitly. I would overload operators beside * to do it.
For example, if I wanted to contract two rank 2 tensors, I would write
it as

  T(i,j) = K(i,k)%R(k,j);

If I wanted to form a scalar out of them, then I would write

  S = K(i,j)|R(i,j);

If I wanted to contract an index internally, I would write

  Contraction_Index q('q);
  S = T(q,q);

The Contraction_Index tells the compiler that T(q,q) returns a scalar,
and not a rank 2 tensor. I also had special operators to contract two
rank 2 tensors to give a symmetric rank 2 tensor, operators to
symmetrize a product of vectors, etc. This is a bit of a messy hack,
though. I did it because I wanted to retain static type checking of
tensor types, and I wanted to minimize the number of operations done.
When I write complicated expressions, it doesn't look too horrible,
but I may be fooling myself. As an example, I have

  Copy_Index ii('i'), jj('j');
  Index i('i'), j('j');

  K_new(ii,jj)<Lapse*(R(i,j) + Trace_K*K(i,j) - (2*K(i,k)&K_mix(k,j))
                      - 0.5*g(i,j)*(density - (g_up(k,l)|S(k,l))) - S(i,j));

The Copy_Index tells the computer to return the original tensor with
the indices set to 'i' and 'j'. Index tells the computer to return a
*copy* with the indices set to 'i' and 'j'. The < operator tells the
computer to perform a copy using reference counting, and not an actual
copy over the 3D grid. A big disadvantage is that I have used up
symbols like &. There might be some people who would want to use & as
the original bit-wise AND. Also, the
Contraction_Index/Copy_Index/Index thing is a bit of an extra
complication, although with expression templates, I can probably get
rid of Copy_Index and Contraction_Index.

On a related subject, it seems like the TinyVec routines are the ideal
way to implement Tensors. Since Tensors typically have only a few
members, it would make sense to completely unroll the loops. With the
expression

  K_new(ii,jj) < Lapse*(K(i,k)&K_mix(k,j));

I would want it unrolled at least into

  for(i=1;i<=3;i++)
    {
      for(j=1;j<=3;j++)
        {
          K_new(i,j) = Lapse*(K(i,1)*K_mix(1,j) + K(i,2)*K_mix(2,j)
                + K(i,3)*K_mix(3,j));
        }
    }

I use tensors defined on large 3D arrays, so I would want them further
expanded into a 3D loop, but that seems to be already in place in Blitz.

It does seem like there are times when you don't want things
completely unrolled. For example, the complete contraction in

  K_new(ii,jj) < Lapse*(g(i,j)*(g(k,l)|S(k,l)));

should turn into

  temp=g(1,1)*S(1,1) + g(1,2)*S(1,2) + g(1,3)*S(1,3) + g(2,1)*S(2,1) +
         g(2,2)*S(2,2) + g(2,3)*S(2,3) + g(3,1)*S(3,1) + g(3,2)*S(3,2) +
         g(3,3)*S(3,3);

  for(i=1;i<=3;i++)
    {
      for(j=1;j<=3;j++)
        {
          K_new(i,j) = Lapse*g(i,j)*temp;
        }
    }

It is not all that clear to me how to formulate rules as to when to
use temporaries and when to use just unroll. It also depends on
whether we want to optimize memory of speed. It seems to be worth it
to figure it out.

        5. Expression templates?

        Steve Karmesin suggested that expression templates aren't necessary,
        and that you can just do pairwise evaluation with construct-in-place
        via a proxy return type. For simple expressions like

        A + B + C

        this could be equally or more efficient than expression templates,
        particularly on not-so-good compilers. The big problem with
        expression templates is that you generate lots of overhead
        during parsing; not-so-good compilers won't get rid of this,
        so there might be a performance loss if expression templates
        were used.

Using proxy return types won't be able to efficiently handle
complicated expressions. There are simply too many permutations to
even consider hardcoding every type of expression. It is much better
to use a set of rules that can handle anything.

I also don't think that we should worry too much about bad compilers
screwing up the efficiency, since there do exist widely available
compilers which can do the job. There will always exist compilers
that do a bad job of optimizing, but it shouldn't be the programmer's
job to optimize.

Walter Landry
landry@spacenet.tn.cornell.edu

--------------------- blitz-dev list --------------------------------
* To subscribe/unsubscribe: mail to majordomo@oonumerics.org, with
"subscribe blitz-dev" or "unsubscribe blitz-dev" in the body of the message
* Blitz++ web page: http://oonumerics.org/blitz/



This archive was generated by hypermail 2b29 : Wed Feb 20 2002 - 04:30:05 EST