Blitz logo

Blitz Support :

From: Xianglong Yuan (yuanx_at_[hidden])
Date: 2004-08-19 04:03:55


Hi Xavier,

Xavier wrote:

> plain array : 240000
> plain array 2 : 70000 --> my version
> blitz array : 780000
> blitz array 2 : 70000 --> my version

I believe these numbers are not real, due to the fact that 'plain
array 2' repeats the first run. By simply swapping the positions of
these two loops, 'plain array' runs faster than 'plain array 2'. See
the results below.

before swapping positions of the two loops:

> plain array : 140000
> plain array 2 : 110000

after swapping:

> plain array 2 : 140000
> plain array : 110000

(gcc3.3.1 with -O3 -funroll-loops -fstrict-aliasing)

I don't know why the difference is so small in my case. Maybe it
relates with cache and/or memory. More than 3 times difference in
your case is very impressive. What kind of hardware do you use? Mine
is P4 3.06 GHz (512K cache?) with 1GB memory.

In my blitz_test.cpp, there are some multiplications in the inner
loops that can be moved out. But I don't think we need to worry about
them. Modern compilers are pretty smart in optimizing loops. I've
tried before with other codes to get all the calculations out of inner
loops, but the improvement are trivial. You may get significant
improvement by doing these under the -O0 flag, but achieve little
under -O3 flag. The compiler can do all these common optimizations
and does better. Optimizing codes at higher levels, nonetheless, is
indeed very important.

Xianglong