Monday, April 25, 2016

Estimation of allelic base frequencies

Tao Xiang is estimating base frequencies in a pig population. A simple but efficient method is Gengler et al. 2007 . This is actually very similar to GCMTBLUP, nut the latter considers one locus and an associated trait. Thus, Tao implement the model g=1m+Za+e  using blupf90 software, where m=2p. The data file, with genotype coded as 0/1/2 looks like

6051 1 2
6930 1 2
5205 1 2
7478 1 0

3208 1 1

Using direct inversion by fspak, the solution is p=0.88 . However, using the default iterative solver of blupf90 we get 0.44 !! Actually, using a more strict convergence criteria we get the exact solution:

  convcrit iter      20*m         p
1       12    5  9.016301 0.4508151
2       13    5  9.016301 0.4508151
3       14    9  9.016345 0.4508172
4       15   11  9.016358 0.4508179
5       16  103 16.779241 0.8389621
6       17  110 17.503890 0.8751945
7       18  117 17.570641 0.8785320
8       19  124 17.574420 0.8787210

9       20  126 17.574496 0.8787248

where the convergence criteria is actually 10^(-convcrit). So, the iterator is stuck at 0.45. The reason for this is that heritability is 1 so that for a given mu, EBVs for genotyped animals almost do not change and viceversa.

This is how the correct estimate of EBV looks like:
There is no pathology in this graph. The first 7000 individuals have genotype, so the breeding value is simply the deviation of the genotype from the populational mean. The three bars are individuals in genotypes AA, Aa and aa. The rest of individuals are ancestors, vaguely recoded in ascending order, so that their genotype is not neatly estimated.

This is how the wrong estimate of EBV looks like:
With biased EBVs for genotyped animals and, for non genotyped animals, EBVs quite different from the previous graph.

Take home message: if you estimate base allele frequencies using Gengler method by BLUP, use either solutions by inversion or set the convergence criteria to strict values.


No comments:

Post a Comment