:-) From ferrell@alpheratz.mit.edu Wed Mar 19 16:45 EST 1997
:-) Date: Wed, 19 Mar 1997 16:37:59 -0500
:-) To: bannov@c11177-330dan.ece.ncsu.edu
:-) Cc: oon-list@oonumerics.org
:-) Subject: Re: OON: f90 vrs CC
:-) From: ferrell@cpca.com
:-)
:-)
:-) How do you handle the case where the user writes:
:-) A = A*C?
:-)
:-) This is a perfectly valid invocation, and well defined in F90
:-) (behavior should be as if RHS gets fully evaluated before assignment
:-) is done). Using the function you wrote would give incorrect results
:-) because A would get overwritten before the RHS was fully evaluated.
:-)
:-) I have come across this problem frequently in providing library
:-) routines. I would like to optimize for the most common case: LHS does
:-) not overlap any of the arguments. However, I find it tricky to do
:-) that optimization and still provide correct results in the event user
:-) to writes statements such as A=A*C.
:-)
:-) -robert
:-)
:-) --
:-) ========================================================================
:-) Robert C. Ferrell, PhD E-mail: ferrell@cpca.com
:-) Massachusetts Institute of Technology MIT Office: (617) 253-3961
:-) Cambridge Power Computing Associates, Ltd. CPCA Office:(617) 734-8569
:-) ========================================================================
:-)
I do not write libraries, but
I think the problem is difficult. I have a simple solution for the case
A=A*B and A=B*A, but not for the both cases.
C does not have to store arrays of more than 1 dimension
in sequential memory, fortran does.
I think the fastest solution is to write f77 subroutine which
passes adresses of all A, B, and C to C-function, which analyses
physical adresses and compares with dimensions of arrays, and return
an answer to the question if there is any overlap in the memory,
the next step would be a branching. But it will be procedural, not OO.
Why allocating memory is not acceptable? In general case you have to
do N^3 arithmetic operation, what is much more than N^2 assignments,
it should not deteriorate performance, it takes memory of course,
but thouse who multiply 10^6 x 10^6 matrixes can check yourself
if they overlap, for 10^3 x 10^3 it is not much memory.
I've attached f90 routines `mult' and `mult_for_dummy' which works.
`mult' has side - effects, so strictly it is not standard compatible,
however it will work if no overlap.
`mult_for_dummy' does not have this restriction.
Nik
module example
contains
function mult(B,C)
real, dimension(:,:), intent(in) :: B, C
real, dimension(size(B,1),size(B,1)) :: mult
integer :: N
integer :: K,J
N = size(B,1)
mult=0. ; do J=1,N ; do K=1,N
mult(1:N,J)=mult(1:N,J)+B(1:N,K)*C(K,J) ; enddo ; enddo
end function mult
function mult_for_dummy(B,C)
real, dimension(:,:), intent(in) :: B, C
real, dimension(size(B,1),size(B,1)) :: mult_for_dummy
real, dimension(:,:), allocatable :: tmp
integer :: N
integer :: K,J
N = size(B,1) ; allocate(tmp(1:N,1:N))
tmp=0. ; do J=1,N ; do K=1,N
tmp(1:N,J)=tmp(1:N,J)+B(1:N,K)*C(K,J) ; enddo ; enddo
mult_for_dummy=tmp
deallocate(tmp)
end function mult_for_dummy
end module example
subroutine sample91
use example
implicit NONE
integer, parameter :: N=2
real, dimension(N,N) :: A,B,C
interface operator(.MULTIC.)
module procedure mult
end interface
interface operator(.MULTFD.)
module procedure mult_for_dummy
end interface
B = RESHAPE( (/ 1., 0., 2., 4. /) , (/2,2/) )
C = RESHAPE( (/ 1., 0., 1., 2. /) , (/2,2/) )
write(*,'(/" --- matrix B ---"/)')
write(*,'(4g10.2)')B
write(*,'(/" --- matrix C ---"/)')
write(*,'(4g10.2)')C
A = B .MULTIC. C
write(*,'(/" --- matrix A = B x C ---"/)')
write(*,'(4g10.2)')A
A = B * C
write(*,'(/" --- matrix A = B*C (intrinsic)---"/)')
write(*,'(4g10.2)')A
A = B
A = A .MULTFD. A
write(*,'(/" --- matrix A = B x B---"/)')
write(*,'(4g10.2)')A
write(*,'(/" --- the end ---"/)')
return
end subroutine sample91
program main
call sample91
end
This archive was generated by hypermail 2b29 : Wed Feb 20 2002 - 03:20:05 EST