Let's say that the problem is roughly equivalent in size to 1,000,000 animals in pedigree and 40,000 genotyped individuals.
Using debug options in compilation
optf90 = -g -traceback -check all -fpe0
optf77 = -g -traceback -check all -fpe0
showed that in the pcg.f90 subroutine there was a division by 0 in subroutine
subroutine solve_sparse_hashm_pcg
a subroutine that worked perfectly for many many years. In this line:
alpha=tau/dot_product(p,w)
w=mult(a,p)
where a is a sparse hash matrix as described in the course notes. This mult is carried out in file fspak90.f90 (which can be downloaded here). If a has elements and p has elements, how come that their product is 0? The reason is that a , which is basically a three-column array, contains lots of lines: actually, 2147483648 elements. It turns out that this is beyond the highest integer in fortran, which is 2147483647. In order to store very large matrices, the hash structure uses "bigger" integers(i8) (8 bytes) up to a MUCH larger value. So, when a hash matrix arrives with its integer(i8) to the naif mult subroutine with its "normal" integers, there is a conversion problem - and the result is 0.
Fixing the problem is as simple as changing
integer::j
into
integer(i8)::j
and the program now works perfectly...
No comments:
Post a Comment