artadia: December 2017

To consider the genetic variance of a population we resort more and more to the genetic variance of an observed population, as opposed to the genetic variance of an ideal population. For instance, the second does not consider neither linkage disequilibrium nor deviations of Hardy-Weinberg equilibrium. Thus the first may be more useful or accurate.

The (empirical) genetic variance of one population composed of $n$ individuals is simply the (empirical) variance of its genetic values:

$latex var(u)=\frac{1}{n}{\sum_i (u_i^2)}-(\frac{1}{n}{\sum_i(u_i)})^2 $latex

This definition does not consider any genetic structure among the observed individuals. However, because genetic values are unknown, we need to estimate them, typically in a Mixed Model context.

To my knowledge, there are two methods to do this. The first is to estimate a variance component (say $latex \sigma^2_u $latex) , that actually refers to an infinite, unrelated population. Then, based on this variance component, we can estimate the expected value of the empirical $var(u)$ over the multivariate distribution $latex Var(\bf u)=\bf K \sigma^2_u$latex where $latex \bf K$latex is a relationship matrix. This results in "my" estimator:

$latex E(var(u))=D_K \sigma^2_u = (mean(diag({\bf K}))-mean({\bf K}))\sigma^2_u $latex

that is becoming increasingly used (and not only by ourselves 😅 ), among other things, because it is extremely simple.

A more refined approach was developed by Sorensen et al. (this is in my opinion Daniel's best paper) and recently revived in the context of genomic prediction by Lehermeier et al. . The estimator is Bayesian by nature, and it constructs the posterior distribution of $latex var(u)$latex from samples of the genetic values $\bf u$, i.e., at an iteration of the Gibbs sampler the sample of $latex var(u)$latex is

$latex \widetilde{var(u)}=\frac{1}{n}{\sum_i (\tilde u_i^2)}-(\frac{1}{n}{\sum_i(\tilde u_i)})^2 $latex

This is well described in BGLR. In fact "my" method is a simplification of Sorensen et al. method.

As a frequent blupf90 suite user, I want to implement this other approach in Gibbs2f90. It is actually quite simple without modifying the code:

write to output samples of genetic values $latex \bf u$latex using OPTION solution all in the parameter file
post-process these samples to obtain samples of $latex \widetilde{var(u)}$latex using $latex \widetilde{var(u)}=\frac{1}{n}{\sum_i (\tilde u_i^2)}-(\frac{1}{n}{\sum_i(\tilde u_i)})^2$latex
get an estimates (i.e. the mean) from the samples of $latex \widetilde{var(u)}$latex

I have re-analysed using Sorensen et al. the mice data set used in "my" estimator to obtain estimates of genetic variance. Using four kind of relationships: Identity by State, VanRaden's G, pedigree relationships and Kernel similarities. The estimates (posterior means) of genetic variance $var(u)$ are :

IBS: 0.037
G: 0.035
Kernel: 0.050
pedigree: 0.035

which compare well with my previous estimates of 0.036, 0.033, 0.047 and 0.038. The larger value of the kernel estimator is because it includes both additive and dominant relationships. When do we expect to have real differences? For small, very informative data sets like Lehermeier et al. data. In which case Sorensen et al. method is better.

artadia

Friday, December 22, 2017

Genetic Variances in an observed population