Blitz logo

Blitz Support :

From: Julian Cummings (cummings_at_[hidden])
Date: 2004-04-14 21:42:05


Hi Lee,

I played around a bit with the example code you submitted. The main
thing you are doing wrong with repsect to the blitz part of your test is
that you are using square bracket indexing notation instead of using the
standard blitz indexing notation with parentheses. Thus,

// Variable Array test using Blitz++: Array<int,2>
int test_six(int M, int N, Array<int,2> thisarray)
{
  // NOTE: Switching iteration order or subscript order for C++ optimization

  for (int i=0; i < M; i++)
    {
      for (int j=0; j < N; j++)
        {
          thisarray(i,j)=(i*N)+j;
        }
    }

}

Just changing from square brackets to parentheses got me a factor of 40
increase in speed!
Square bracket notation is actually used in blitz for accessing separate
components of an array with multicomponent elements, which is completely
different from element access.
So your original test was not testing the right action. With this
change, I got these results:

TEST THREE - C++ Call of F90 SUBROUTINE CALL
TEST THREE - 10000 invocations of function test_three
elapsed time in seconds: 51.400000

TEST FOUR - C++ Call of COMPLETE C++
TEST FOUR - 10000 invocations of function test_four
elapsed time in seconds: 51.120000

TEST SIX - Blitz++/C++ Call of COMPLETE C++ (variable array)
TEST SIX - 10000 invocations of function test_six
elapsed time in seconds: 143.550000

Note that I have run the same number of iterations for the blitz loop
here as for the other loops (10000). Now the blitz array loop speed is
at least in the same ballpark as the F90 and straight C++ loops. I got
similar results when I tried compiling the exp9.cc code with the icc
compiler instead of the g++ compiler. I'll look at this some more and
see if there are ways to get any further improvements in the blitz
performance. Ideally, the performance should be about the same as the
raw C++ or F90 for this case.

Regards, Julian C.

Atkinson wrote:

>Todd Veldhuizen:
>
>Todd,
>
>I recently ran the code attached using both TNT and Blitz-0.6. I was
>surprised to see such a slow throughput as compared to TNT.
>
>
>Is there a more optimized way to compile to achieve a more equivalent
>performance?
>
>
>The compile line is in exp9.cc.
>
>
>Thanks in Advance for your help!
>
>
>Lee Atkinson, Sr. SW ENG, CTR
>Targacept, Inc
>336-480-2232
>
>
>------------------------------------------------------------------------
>
>! FILENAME: arraytest.f90
>!
>! ifort -c arraytest.f90
>!
>SUBROUTINE init_2d_array(M, N, thisarray)
> IMPLICIT NONE
> INTEGER*4 :: M,N
> INTEGER*4, TARGET :: thisarray(M,N)
>
>
> ! local integers
> INTEGER :: i, j
>
> do i=1,M
> do j=1,N
> thisarray(j,i)=(i*N)+j
> end do
> end do
>
>END SUBROUTINE init_2d_array
>
>
>
>SUBROUTINE cycle_2d_array(iters, size, thisarray)
> IMPLICIT NONE
> INTEGER*4 :: iters
> INTEGER*4 :: size
> INTEGER*4, TARGET :: thisarray(size,size)
>
>
> ! local integers
> INTEGER :: i, j, k
>
> do k=1,iters
> do i=1,10
> do j=1,10
> thisarray(j,i)=19
> end do
> end do
> end do
>
>END SUBROUTINE cycle_2d_array
>
>
>SUBROUTINE init_3d_array(N, M, L, thisarray)
> IMPLICIT NONE
> INTEGER*4 :: N, M, L
> INTEGER*4, TARGET :: thisarray(N,M,L)
>
>
> ! local integers
> INTEGER :: i, j, k
>
> do k=1,L
> do j=1,M
> do i=1,N
> thisarray(i,j,k)=(i - 1) + ((j - 1)*8) + ((k - 1)*32)
> end do
> end do
> end do
>
>END SUBROUTINE init_3d_array
>
>
>SUBROUTINE test_3d()
> IMPLICIT NONE
> INTEGER*4, PARAMETER :: N = 8
> INTEGER*4, PARAMETER :: M = 4
> INTEGER*4, PARAMETER :: L = 2
>
> INTEGER*4 :: thisarray(N,M,L)
>
> ! local integers
> INTEGER :: i, j, k
>
>
> call init_3d_array(N, M, L, thisarray)
>
> do k=1,L
> do j=1,M
> do i=1,N
> write(*,*) i, j, k, thisarray(i,j,k)
> end do
> end do
> end do
>
>END SUBROUTINE test_3d
>
>
>------------------------------------------------------------------------
>
>// Exp9
>// --------------------------------------------------------------------------------
>// TARGACEPT
>// Lee Atkinson
>// March 25, 2004
>// --------------------------------------------------------------------------------
>
>// --------------------------------------------------------------------------------
>// EXP8:
>// This is a test program which
>//
>//
>// C++ with F90 Test
>// -----------------
>//
>// 1st:
>// create a C++ main that call a
>// C++ function as though converted from fortran
>// and have the C++ function call a fortran function
>//
>// The above will illustrate the conversion from F90
>// to C++ using non-converted F90 libs.
>//
>// Results should determine method and details of conversion
>//
>//
>//
>// 2nd:
>// For C++/ C++-F90/ F90:
>//
>// call a 2d array assignment, 1000 times
>// call a loop on 1000 to 2d array assignment, 1 times
>// need timer, use TNT Stopwatch
>//
>// Perform evaluation on straight calls to F90
>// ------------------
>//
>// --------------------------------------------------------------------------------
>
>
>// --------------------------------------------------------------------------------
>// INCLUDES
>// /usr/include/c++/3.2.2/
>// --------------------------------------------------------------------------------
>
>#include <cstdio>
>#include <iostream>
>#include <complex>
>#include <math.h>
>#include <blitz/array.h>
>#include "tnt.h"
>using namespace TNT;
>using namespace blitz;
>
>
>
>
>
>
>// --------------------------------------------------------------------------------
>
>// g++ -O3 -funroll-loops -frerun-loop-opt -o arytst9 exp9.cc arraytest.o -I/usr/include/c++/3.2.2/ -I/opt/tnt/1.2/src -I/opt/blitz-0.6 -ftemplate-depth-30 -L/opt/intel_fc_80/lib /opt/intel_fc_80/lib/libifcore.so.5 -L/usr/local/blitz-0.6/lib -lblitz -lm
>
>// --------------------------------------------------------------------------------
>
>// --------------------------------------------------------------------------------
>
>extern "C" { void init_2d_array_(int* M, int* N, int* thisarray); }
>
>// --------------------------------------------------------------------------------
>
>// --------------------------------------------------------------------------------
>// MYFUNCTION
>// --------------------------------------------------------------------------------
>int test_one(int M, int N, int thisarray[1000][1000])
>{
>
>
> // HYPOTHETICAL FORTRAN 90
> // do i=1,10
> // do j=1,10
> // thisarray(j,i)=19
> // end do
> // end do
>
> //C++ TRANSLATION
> // Note: Must argment bounds in C/C++ to avoid seg fault
>
> for (int i=0; i < M; i++)
> {
> for (int j=0; j < N; j++)
> {
> //thisarray[j][i]=19;
> thisarray[j][i]=(i*N)+j;
> }
> }
>
>}
>
>int test_two(int M, int N, Fortran_Array2D<int> thisarray)
>{
>
>
> // HYPOTHETICAL FORTRAN 90
> // do i=1,10
> // do j=1,10
> // thisarray(j,i)=19
> // end do
> // end do
>
> //C++ TRANSLATION
>
> for (int i=1; i <= M; i++)
> {
> for (int j=1; j <= N; j++)
> {
> //thisarray(j,i)=19;
> thisarray(j,i)=(i*N)+j;
> }
> }
>
>}
>
>
>// Test Three is an inline Fortran call to init_2d_array. See MAIN
>
>
>
>int test_four(int M, int N, int thisarray[1000][1000])
>{
> // NOTE: Switching iteration order or subscript order for C++ optimization
>
>
> for (int i=0; i < M; i++)
> {
> for (int j=0; j < N; j++)
> {
> //thisarray[i][j]=19;
> thisarray[i][j]=(i*N)+j;
> }
> }
>
>}
>
>
>// Variable Array test using TNT: Array2D<double> A(M,N);
>int test_five(int M, int N, Array2D<int> thisarray)
>{
> // NOTE: Switching iteration order or subscript order for C++ optimization
>
> for (int i=0; i < M; i++)
> {
> for (int j=0; j < N; j++)
> {
> //thisarray[i][j]=19;
> thisarray[i][j]=(i*N)+j;
> }
> }
>
>}
>
>
>// Variable Array test using Bliltz++: Array<int,2>
>int test_six(int M, int N, Array<int,2> thisarray)
>{
> // NOTE: Switching iteration order or subscript order for C++ optimization
>
> for (int i=0; i < M; i++)
> {
> for (int j=0; j < N; j++)
> {
> //thisarray[i][j]=19;
> thisarray[i][j]=(i*N)+j;
> }
> }
>
>}
>
>
>
>// --------------------------------------------------------------------------------
>// MAIN
>// --------------------------------------------------------------------------------
>
>
>int main ()
>{
> using namespace std;
>
> int steps=10000;
> int M=1000;
> int N=1000;
> int ii,jj;
> Stopwatch Q;
> double time_elapsed;
> int R[1000][1000];
> int *Rptr;
> int* Mptr;
> int* Nptr;
> Array2D<int> A(M,N); /* create C++ TNT MxN array; all zeros */
> Fortran_Array2D<int> F(M,N); /* create TNT Fortran MxN array; all zeros */
> Array<int,2> B(M,N); /* create C++ Blitz++ MxN array; all zeros */
>
> Mptr=&M;
> Nptr=&N;
> Rptr=&R[0][0];
>
> cout << "Hello TARGACEPT\n";
>
> printf("MAIN: \n\n");
>
>
> printf("TEST ONE - SIMULATED C++ CONVERSION of F90 SUBROUTINE CALL\n");
> printf("TEST ONE - %d invocations of function test_one\n", steps);
> Q.start();
> for (int k=0; k < steps; k++)
> {
> test_one(M,N,R);
> }
> Q.stop();
> time_elapsed = Q.read();
> printf("elapsed time in clock ticks: %f\n\n", time_elapsed);
>
>
>
> printf("TEST TWO - SIMULATED C++ CONVERSION of F90 SUBROUTINE CALL using TNT F90 Array\n");
> printf("TEST TWO - %d invocations of function test_two\n", steps);
> Q.start();
> for (int k=0; k < steps; k++)
> {
> test_two(M,N,F);
> }
> Q.stop();
> time_elapsed = Q.read();
> printf("elapsed time in clock ticks: %f\n\n", time_elapsed);
>
>
>
> printf("TEST THREE - C++ Call of F90 SUBROUTINE CALL\n");
> printf("TEST THREE - %d invocations of function test_three\n", steps);
> Q.start();
> for (int k=0; k < steps; k++)
> {
> init_2d_array_(Mptr, Nptr, Rptr);
> }
> Q.stop();
> time_elapsed = Q.read();
> printf("elapsed time in clock ticks: %f\n\n", time_elapsed);
>
>
>
>
> printf("TEST FOUR - C++ Call of COMPLETE C++\n");
> printf("TEST FOUR - %d invocations of function test_four\n", steps);
> Q.start();
> for (int k=0; k < steps; k++)
> {
> test_four(M, N, R);
> }
> Q.stop();
> time_elapsed = Q.read();
> printf("elapsed time in clock ticks: %f\n\n", time_elapsed);
>
>
>
>
> printf("TEST FIVE - C++ Call of COMPLETE C++ (variable array)\n");
> printf("TEST FIVE - %d invocations of function test_five\n", steps);
> Q.start();
> for (int k=0; k < steps; k++)
> {
> test_five(M, N, A );
> }
> Q.stop();
> time_elapsed = Q.read();
> printf("elapsed time in clock ticks: %f\n\n", time_elapsed);
>
>
> //printf("I=17 J=23 I*N+J=17,023 A[17][23]=%d\n", A[17][23]);
>
> printf("TEST SIX - Blitz++/C++ Call of COMPLETE C++ (variable array)\n");
> printf("TEST SIX - %d invocations of function test_six\n", 100);
> Q.start();
> for (int k=0; k < 100; k++)
> {
> test_six(M, N, B );
> }
> Q.stop();
> time_elapsed = Q.read();
> printf("elapsed time in clock ticks: %f\n\n", time_elapsed);
>
>
>
>
> cout << "Bye TARGACEPT\n";
>
>}
>
>
>------------------------------------------------------------------------
>
>_______________________________________________
>Blitz-support mailing list
>Blitz-support_at_[hidden]
>http://www.oonumerics.org/mailman/listinfo.cgi/blitz-support
>

-- 
Dr. Julian C. Cummings                       E-mail: cummings_at_[hidden]
California Institute of Technology           Phone:  626-395-2543
1200 E. California Blvd., Mail Code 158-79   Fax:    626-584-5917
Pasadena, CA 91125