Friday, December 22, 2017

Genetic Variances in an observed population

To consider the genetic variance of a population we resort more and more to the genetic variance of an observed population, as opposed to the genetic variance of an ideal population. For instance, the second does not consider neither linkage disequilibrium nor deviations of Hardy-Weinberg equilibrium. Thus the first may be more useful or accurate.

The (empirical) genetic variance of one population composed of $n$ individuals is simply the (empirical) variance of its genetic values:

$latex var(u)=\frac{1}{n}{\sum_i (u_i^2)}-(\frac{1}{n}{\sum_i(u_i)})^2 $latex

This definition does not consider any genetic structure among the observed individuals. However, because genetic values are unknown, we need to estimate them, typically in a Mixed Model context.

To my knowledge, there are two methods to do this. The first is to estimate a variance component (say $latex \sigma^2_u $latex) , that actually refers to an infinite, unrelated population. Then, based on this variance component, we can estimate the expected value of the empirical $var(u)$ over the multivariate distribution $latex Var(\bf u)=\bf K \sigma^2_u$latex  where $latex \bf K$latex  is a relationship matrix. This results in "my" estimator:

$latex E(var(u))=D_K \sigma^2_u = (mean(diag({\bf K}))-mean({\bf K}))\sigma^2_u $latex

that is becoming increasingly used (and not only by ourselves 😅 ), among other things, because it is extremely simple.

A more refined approach was developed by Sorensen et al. (this is in my opinion Daniel's best paper) and recently revived in the context of genomic prediction by Lehermeier et al. . The estimator is Bayesian by nature, and it constructs the posterior distribution of $latex var(u)$latex  from samples of the genetic values $\bf u$, i.e., at an iteration of the Gibbs sampler the sample of $latex var(u)$latex  is

$latex \widetilde{var(u)}=\frac{1}{n}{\sum_i (\tilde u_i^2)}-(\frac{1}{n}{\sum_i(\tilde u_i)})^2 $latex

This is well described in BGLR. In fact "my" method is a simplification of Sorensen et al. method.

As a frequent blupf90 suite user, I want to implement this other approach in Gibbs2f90. It is actually quite simple without modifying the code:


  1. write to output samples of genetic values $latex \bf u$latex  using OPTION solution all in the parameter file
  2. post-process these samples to obtain samples of $latex \widetilde{var(u)}$latex  using  $latex \widetilde{var(u)}=\frac{1}{n}{\sum_i (\tilde u_i^2)}-(\frac{1}{n}{\sum_i(\tilde u_i)})^2$latex 
  3. get an estimates (i.e. the mean) from the samples of $latex \widetilde{var(u)}$latex 

I have re-analysed using Sorensen et al.  the mice data set used in "my" estimator to obtain estimates of genetic variance. Using four kind of relationships: Identity by State, VanRaden's G, pedigree relationships and Kernel similarities. The estimates (posterior means) of genetic variance $var(u)$ are  :
  • IBS: 0.037
  • G: 0.035
  • Kernel: 0.050
  • pedigree: 0.035
which compare well with my previous estimates of 0.036, 0.033, 0.047 and 0.038. The larger value of the kernel estimator is because it includes both additive and dominant relationships. When do we expect to have real differences? For small, very informative data sets like  Lehermeier et al. data. In which case Sorensen et al. method is better.












Monday, October 9, 2017

Anatomy of a solved bug in blupf90

I had a request from Alban Bouquet to look into a strange Single Step GBLUP problem in multiple-trait pig breeding data. Using freely available blupf90 and the default iterative solver (PCG), a sequence of NaNs (not a number) is produced, with no solution. The intriguing aspect is that deleting a few records or genotypes at random produced a correct sequence of iterations leading to sensible solutions of the SSGBLUP equations. Also, the SSGBLUP run perfectly (and quicker) using blup90iod2.

Let's say that the problem is roughly equivalent in size to 1,000,000 animals in pedigree and 40,000 genotyped individuals.

Using debug options in compilation

optf90 = -g -traceback -check all -fpe0 
optf77 = -g -traceback -check all -fpe0 

showed that in the pcg.f90 subroutine there was a division by 0 in subroutine

subroutine solve_sparse_hashm_pcg 

a subroutine that worked perfectly for many many years. In this line:

alpha=tau/dot_product(p,w)

it turned out that w (a vector) had all values of 0. However, this vector is computed as

w=mult(a,p)

where a is a sparse hash matrix as described in the course notes. This mult is carried out in file fspak90.f90 (which can be downloaded here). If a has elements and p has elements, how come that their product is 0? The reason is that a , which is basically a three-column array, contains  lots of lines: actually, 2147483648 elements. It turns out that this is beyond the highest integer in fortran, which is 2147483647. In order to store very large matrices, the hash structure uses "bigger" integers(i8) (8 bytes) up to a MUCH larger value. So, when a hash matrix arrives with its integer(i8) to the naif mult subroutine with its "normal" integers, there is a conversion problem - and the result is 0.

Fixing the problem is as simple as changing
integer::j
into
integer(i8)::j

and the program now works perfectly...

   

Wednesday, June 28, 2017

The Breeding Value of Harry Potter

Here you have a description of the Wizard trait in Harry Potter world:



So, the trait "wizard" is conferred by a recessive locus with alleles W and M. What is the breeding value of Harry Potter?

Let "wizard" be scored as 1 and "Muggle" as 0.
The different genotypes have a value, in a classical "Falconerian" setting, of

WW     WM      MM
a       d      -a
0.5   -0.5   -0.5

so that $a=0.5$, $d=-0.5.$ The substitution effect of W is $\alpha=a+(q-p)d$ , the question is, what is the allelic frequency of W?

If the number of wizards is 1% of the population, then $p^2=0.01$ and $\alpha=0.5+(0.9-0.1)(-0.5)=0.1$.

The breeding value of Harry Potter is $2 \alpha= 0.2$.


The allelic frequency can actually be worked out from here: "According to Rowling, the average Hogwarts annual intake for Muggle-borns is 25%" we can work out the frequency of the allele p.

The bulk of these Muggle-borns come from couples WM x WM, of which there is a frequency $4 p^2 q^2$. These couples produce a wizard WW out of every 4 kids, so the frequency of WW born from Muggle-borns is $p^2 q^2.$ The wizard couples WW x WW are $p^4$ of the population, and every kid is wizard WW too. So they represent p^4 .  Then we have that

$$\frac{p^2 q^2}{p^2 q^2 + p^4} =\frac{1}{4}$$

This gives a quadratic equation but it has not solutions in the range $0<p<1$. This is because, if being wizard is rare, by far most wizards come from "Muggle" couples. If the pure wizards are 1% of the population and $p=0.1$, then wizards from Muggle couples WM x WM are 80-fold more than from Wizard WW x WW couples. Should we conclude that most wizards do not go to Rowlands?








Friday, April 21, 2017

compiling fortran with lapack and blas and openmp using Absoft compiler for windows

I download a testing version of Absoft Pro Fortran 2017. I install the compiler and the libraries. Then  I compile using "Development Command Prompt (64 bit)".

I could not easily find how to link to Lapack and Blas provided by the installer. The right options after studying the examples are

Y:\save\progs\parallel_inverse>f95 -openmp -O3 -s -deploy-dlls -m64 -mcmodel=medium pGDsingle.F90 refblas.lib reflapack.lib

The option -s uses the heap (and not the stack) but it is unclear to me if this handles memory in the most efficient way. Alternatively, I can use something like  -stack:500000000 to increase the stack. See here.

The option -openmp reads openmp directives.

The option -deploy-dlls puts the right dll's in the same place as the executable.

The option -m64 tell the compiler to create a 64 bit executable and

-mcmodel=medium to store using 64 bit adresses (both are needed), so that the memory used can be quite large

 It works well:

Y:\save\progs\parallel_inverse>pGDsingle.exe
 nanim,nsnpp
5000 50000
   5000  50000
 Z=rnd()
 y=Z(Z'x)
     LAPACK MKL dgemv #threads=    2    2 Elapsed omp_get_time:     0.3580
     LAPACK MKL dgemv #threads=    2    2 Elapsed omp_get_time:     0.5620
 meanG=   12501.1274661639
 GG=ZZ'
    Dgemm MKL #threads=     2    2 Elapsed omp_get_time:  5463.3270
 meanG=   12501.1274661626
 GG=GG/(tr/n)
 GG=0.95GG+0.05I
 GG^-1
    Inverse LAPACK MKL dpotrf/i #threads=    2    2 Elapsed omp_get_time:  1045.0340

Thursday, March 9, 2017

Submit parallel mkl openmp jobs (blupf90, yams) in genotoul cluster

The blupf90 software and my own one includes parallelised code using Intel fortran MKL and OPENMP facilities. These use several threads, usually witihin a single node (as they share memory). I was getting nuts to make this work in bioinfo.genotoul.fr until I went to see the Support people. One of the problems for me was that there are actually several grid engines so google does not always find the good solution.

Short message: use this:


alegarra@genotoul2 ~/work $ cat test_omp.sh

#!/bin/bash

#$ -pe parallel_smp 10
export OMP_NUM_THREADS=10
your_program

alegarra@genotoul2 ~/work $ qsub test_omp.sh

where 10 is the number of threads that you want.

Long explanation:


I have a simple example that just creates a big matrix and inverts it. Let's try a 1000 x 1000 matrix, and the program runs smoothly without any particular command or request for parallel:

$cat test_omp.sh
#!/bin/bash
echo "1000 1000" | /save/alegarra/progs/parallel_inverse/a.out
alegarra@genotoul2 ~/work $ qsub test_omp.sh

$cat test_omp.sh.o8160505
nanim,nsnpp
        1000        1000
 Z=rnd()
 GG=ZZ
    Dgemm MKL #threads=    20   40 Elapsed omp_get_time:     0.1530
 GG=GG/(tr/n)
 GG=0.95GG+0.05I
 GG^-1
    Inverse LAPACK MKL dpotrf/i #threads=   40   40 Elapsed omp_get_time:     0.1071

Epilog : job finished at Thu Mar  9 15:45:53 CET 2017

However, this fails if I try a larger 10000 x 10000 matrix:

OMP: Error #34: System unable to allocate necessary resources for OMP thread:
OMP: System error #11: Resource temporarily unavailable
OMP: Hint: Try decreasing the value of OMP_NUM_THREADS.
OMP: Error #178: Function pthread_getattr_np failed:
OMP: System error #12: Cannot allocate memory
forrtl: error (76): Abort trap signal

The reason is that every thread allocates memory, so that this is too much. 

Then I try by requesting explicitly 4 threads and 1000 x 1000 matrix. This is doable by requesting the adequate environment:

-pe parallel_smp demands X cores on the same node (multi-thread, OpenMP) as explained in the docs.
Do not forget putting the SAME number in -pe parallel_smp and in export OMP_NUM_THREADS

alegarra@genotoul2 ~/work $ cat test_omp.sh
#!/bin/bash
export OMP_NUM_THREADS=4 
echo "1000 1000" | /save/alegarra/progs/parallel_inverse/a.out
alegarra@genotoul2 ~/work $ qsub -pe parallel_smp 4 test_omp.sh

and it does work:
alegarra@genotoul2 ~/work $ cat test_omp.sh.o8160541 
 nanim,nsnpp
        1000        1000
 Z=rnd()
 GG=ZZ
    Dgemm MKL #threads=     4    4 Elapsed omp_get_time:     0.0320
 GG=GG/(tr/n)
 GG=0.95GG+0.05I
 GG^-1
    Inverse LAPACK MKL dpotrf/i #threads=    4    4 Elapsed omp_get_time:     0.0230
Epilog : job finished at Thu Mar  9 16:01:30 CET 2017

now, try with 10000 x 10000:


alegarra@genotoul2 ~/work $ cat test_omp.sh.o8160558 

 nanim,nsnpp

       10000       10000
 Z=rnd()
 GG=ZZ
    Dgemm MKL #threads=     4    4 Elapsed omp_get_time:    45.8333
 GG=GG/(tr/n)
 GG=0.95GG+0.05I
 GG^-1
    Inverse LAPACK MKL dpotrf/i #threads=    4    4 Elapsed omp_get_time:    29.2803
Epilog : job finished at Thu Mar  9 16:05:11 CET 2017

and it also works. I can also put the -pe within the script:

alegarra@genotoul2 ~/work $ cat test_omp.sh

#!/bin/bash

#$ -pe parallel_smp 4
export OMP_NUM_THREADS=4
echo "10000 10000" | /save/alegarra/progs/parallel_inverse/a.out
alegarra@genotoul2 ~/work $ qsub test_omp.sh


Now, I try with 10 threads and it also works. Then I will try using yams:


A couple of tricks more. To know the parallel environments in the cluster and some characteristics:


alegarra@genotoul2 ~/work/mtr_2015/mf/methodR $ qconf -spl
parallel_10
parallel_20
parallel_40
parallel_48
parallel_fill
parallel_fill_amd
parallel_rr
parallel_rr_amd
parallel_smp
parallel_testq
alegarra@genotoul2 ~/work/mtr_2015/mf/methodR $ qconf -sp parallel_smp
pe_name            parallel_smp
slots              9999
user_lists         NONE
xuser_lists        NONE
start_proc_args    NONE
stop_proc_args     NONE
allocation_rule    $pe_slots
control_slaves     TRUE
job_is_first_task  FALSE
urgency_slots      min

accounting_summary TRUE



Tuesday, February 7, 2017

Julia syntax in vim

I try to put julia syntax in Macvim. I found this github repository: https://github.com/JuliaEditorSupport/julia-vim and I blindly follow the instructions in https://github.com/JuliaEditorSupport/julia-vim#manually , starting from ~/.vim . It does work:


There are some cool tricks, for instance I can write Unicode :
\alpha + Tab = α

Friday, February 3, 2017

tests with Julia

So, after having installed Julia months ago I gave it some tries. The speedup seems larger than an order of magnitude in both cases

looping through the same matrix

R
system.time({val=0; for (i in 1:2000){for (j in 1:2000){val=val+a[i,j]}}; val})
   user  system elapsed 
  1.862   0.015   1.868 
Julia
julia> tic(); val=0; for i in 1:2000; for j in 1:2000; val=val+a[i,j]; end; end; println(val); toc()
5.142821063269366
elapsed time: 0.327145074 seconds
0.327145074

Singular value decomposition of a 2000 x 2000 matrix


> a=matrix(rnorm(4000000),2000)
> system.time(svd(a))
   user  system elapsed 
 27.419   0.218  27.452 
Julia
julia> tic(); a=randn(2000,2000); toc()
elapsed time: 0.673701438 seconds
0.673701438

julia> tic(); svd(a); toc()
elapsed time: 5.236212822 seconds
5.236212822

Wednesday, January 25, 2017

transposing marker data

Quite frequently one may find marker data ordered in this way:

alegarra@genotoul2 ~/save/progs $ cat ex_rachel
 1 a b
 1 b b
 1 c c
 2 b b
 2 c c
 2 d d
 3 a b
 3 b b
 3 c c

e.g. there are three animals (1 to 3) and three markers. Rachel and I would like them to be formatted one animal per line and markers one after each other, in this way:


alegarra@genotoul2 ~/save/progs $ cat out
         1 ab bb cc 
         2 bb cc dd 
         3 ab bb cc 

This is conceptually simple if animals are sorted:
  1. Read a line
  2. If the animal is the same as the old one, print the markers after the previous one
  3. If the animal is different, start a new line, print the animal and the markers.
  4. Add special cases for the first and last line
Here is an awk implementation


alegarra@genotoul2 ~/save/progs $ cat SNPcol2line.awk 
#! /bin/awk -f
# this program reads genotypes in one line per locus
# then puts them as
# individual allele1 allele2 allele1 allele 2
#
BEGIN{
idold=0
i=0
}
{
id=$1
# if new animal
if(id!=idold){
if(idold!=0){
# close previous line
printf("\n")
}
# write new id
printf("%10s%1s",id," ")
idold=id
}
printf("%1s%1s%1s",$2,$3," ")
}
END{
# last individual
printf("\n")
}

which works:

alegarra@genotoul2 ~/save/progs $ ./SNPcol2line.awk ex_rachel 
         1 ab bb cc 
         2 bb cc dd 
         3 ab bb cc