Re: OON: Comparison between Fortran & C++

From: Brad Lucier (lucier@math.purdue.edu)
Date: Sun Mar 26 2000 - 00:02:50 EST


> Brad Lucier writes:
> > It seems to be fairly fast; e.g., my second generation sparse
> > matrix--vector multiply with a single right hand side takes 17 cycles
> > per flop on my Alpha 21264.
>
> A lot depends on the size of the matrix for whether this really is
> fast. The Alpha is a darn fast chip overall, so even at 17 cycles per
> flop it is pretty quick.
>
> But if memory serves it ought to be able to do 0.5 cycles per flop if
> the data fits in cache. Or 1 cycle per flop, I forget. Either way,
> more than an order of magnitude, and you would see close the machine
> speed in either C or Fortran.
>
> -Steve Karmesin

I bit the bullet and wrote the same code in C, with the same memory
access pattern (I dumped the Scheme matrix structure). I timed the loop

  for (k = 0; k < 100; k++)
    {
      vec2 = (double *) calloc (n, sizeof (double));
      for (i = 0; i < n; i++)
        {
          local_neighbors = neighbors[i];
          local_coefficients = coefficients[i];
          if (neighbor_size[i] == 7)
            vec2[i] =
                local_coefficients[0] * vec1[local_neighbors[0]]
              + local_coefficients[1] * vec1[local_neighbors[1]]
              + local_coefficients[2] * vec1[local_neighbors[2]]
              + local_coefficients[3] * vec1[local_neighbors[3]]
              + local_coefficients[4] * vec1[local_neighbors[4]]
              + local_coefficients[5] * vec1[local_neighbors[5]]
              + local_coefficients[6] * vec1[local_neighbors[6]];
          else
            {
              sum = 0.0;
              for (j = 0; j < neighbor_size[i]; j++)
                sum += local_coefficients[j] * vec1[local_neighbors[j]];
              vec2[i] = sum;
            }
        }
      free (vec2);
    }

I allocated vec2 each time because that was what the Scheme code did
(both could have been rewritten to reuse vec2, but ...). The Scheme
code took .500 seconds, the C code took .455 seconds. When I reused
vec2, the C loop took .416 seconds.

There really is not a substantive performance difference between Scheme
and C on this example.

Brad Lucier

--------------------- Object Oriented Numerics List --------------------------
* To subscribe/unsubscribe: use the handy web form at
http://oonumerics.org/oon/
* If this doesn't work, please send a note to owner-oon-list@oonumerics.org



This archive was generated by hypermail 2b29 : Wed Feb 20 2002 - 03:20:11 EST