Blitz logo

Blitz Support :

From: Andreas R. (andreasreifschneider_at_[hidden])
Date: 2004-06-27 15:45:02


Hello Julian.

Thanks a lot for your answer.

> I have been looking into the performance issues with these example
> programs that you sent.
> I see performance numbers similar to yours on my P4 box. The C code
> myprogram1 completes in 3.9 sec., whereas the blitz version using
> stencil declarations in myprogram3 takes about 5.9 sec. to run. You
> can improve somewhat the performance of myprogram3 by replacing the
> three constant-value matrices being used in the stencil with actual
> scalars. It is hard to do this using the blitz macros for declaring
> stencil objects, but can be done if you write out fully the definition
> of your stencil object and modify it to store scalar parameters.

I thought about the constants this week, but my solution has just global
variables for them (which I could also put into a namespace), but this
doesn't look nice.

> struct spots2D_stencil {
> spots2D_stencil(double sin, double d_ain, double d_bin)
> : s(sin), d_a(d_ain), d_b(d_bin) {}
> template <typename T1, typename T2, typename T3, typename T4,
> typename T5,
> typename T6, typename T7, typename T8, typename T9,
> typename T10,
> typename T11>
> inline void apply(T1& beta, T2& a, T3& b, T4& a_out, T5& b_out, T6,
> T7, T8, T9, T10, T11) const
> {
> a_out = a + s * (16 - a*b) + d_a * Laplacian2D(a);
> b_out = b + s * (a*b - b - beta) + d_b * Laplacian2D(b);
> if (a_out < 0.0) a_out = 0.0;
> if (b_out < 0.0) b_out = 0.0;
> }
> template <int N>
> inline void getExtent(TinyVector<int,N>& minb,
> TinyVector<int,N>& maxb) const
> {
> minb = shape(-1,-1);
> maxb = shape(+1,+1);
> }
> enum { hasExtent = 1 };
> const double& s, d_a, d_b;
> };

>
> Notice that I have added three members that are const double
> references to store the three constant scalar parameters, and a
> constructor that initializes these references. With this definition
> of spots2D_stencil, you must pass the scalar constants when you
> construct the stencil object:
>
> applyStencil(spots2D_stencil(s,d_a,d_b),matrix_beta,
> matrix_a,matrix_b,matrix_a_out,matrix_b_out);
>
> But now we only have five arrays in our stencil, so it is a bit more
> efficient. My tests showed a run time of 4.9 sec. with this improvement.

Wow, this is of course a much better approach (and performs already
(slightly) better than std::vector<std::vector<double> > while having
much nicer notation). The use of const doubles instead of const double
references makes it a little bit faster (user 0m3.660s vs 0m3.718s).

> A further improvement would be to add the unit stride optimization to
> the implementations of applyStencil(). If the innermost loop has a
> stride of 1 for all the Array operands in the stencil, then we can use
> advanceUnitStride() to increment all the Array iterators instead of
> advance(). Some quick tests that I did indicate that this will reduce
> the run time of myprogram3 to about 4.3 sec. Still not quite as fast
> as the C code, but fairly close. There will always be some amount of
> overhead associated with determining the bounds of the stencil loops
> in each dimension and the ordering of the loops, among other things.
> But I will try to put in the unit stride optimization for the
> applyStencil() function, since it is clearly quite beneficial.

This wouldn't have come up to my mind, since it would have required a
much deeper understanding of the internals of blitz.
But I'm looking forward to it :-)
By the way the CVS version I used to test one week ago was already a
little bit faster than version 0.7.

Regards, Andreas R.