The (empirical) genetic variance of one population composed of $n$ individuals is simply the (empirical) variance of its genetic values:
$latex var(u)=\frac{1}{n}{\sum_i (u_i^2)}-(\frac{1}{n}{\sum_i(u_i)})^2 $latex
This definition does not consider any genetic structure among the observed individuals. However, because genetic values are unknown, we need to estimate them, typically in a Mixed Model context.
To my knowledge, there are two methods to do this. The first is to estimate a variance component (say $latex \sigma^2_u $latex) , that actually refers to an infinite, unrelated population. Then, based on this variance component, we can estimate the expected value of the empirical $var(u)$ over the multivariate distribution $latex Var(\bf u)=\bf K \sigma^2_u$latex where $latex \bf K$latex is a relationship matrix. This results in "my" estimator:
$latex E(var(u))=D_K \sigma^2_u = (mean(diag({\bf K}))-mean({\bf K}))\sigma^2_u $latex
that is becoming increasingly used (and not only by ourselves 😅 ), among other things, because it is extremely simple.
A more refined approach was developed by Sorensen et al. (this is in my opinion Daniel's best paper) and recently revived in the context of genomic prediction by Lehermeier et al. . The estimator is Bayesian by nature, and it constructs the posterior distribution of $latex var(u)$latex from samples of the genetic values $\bf u$, i.e., at an iteration of the Gibbs sampler the sample of $latex var(u)$latex is
$latex \widetilde{var(u)}=\frac{1}{n}{\sum_i (\tilde u_i^2)}-(\frac{1}{n}{\sum_i(\tilde u_i)})^2 $latex
This is well described in BGLR. In fact "my" method is a simplification of Sorensen et al. method.
- write to output samples of genetic values $latex \bf u$latex using OPTION solution all in the parameter file
- post-process these samples to obtain samples of $latex \widetilde{var(u)}$latex using $latex \widetilde{var(u)}=\frac{1}{n}{\sum_i (\tilde u_i^2)}-(\frac{1}{n}{\sum_i(\tilde u_i)})^2$latex
- get an estimates (i.e. the mean) from the samples of $latex \widetilde{var(u)}$latex
I have re-analysed using Sorensen et al. the mice data set used in "my" estimator to obtain estimates of genetic variance. Using four kind of relationships: Identity by State, VanRaden's G, pedigree relationships and Kernel similarities. The estimates (posterior means) of genetic variance $var(u)$ are :
- IBS: 0.037
- G: 0.035
- Kernel: 0.050
- pedigree: 0.035
which compare well with my previous estimates of 0.036, 0.033, 0.047 and 0.038. The larger value of the kernel estimator is because it includes both additive and dominant relationships. When do we expect to have real differences? For small, very informative data sets like Lehermeier et al. data. In which case Sorensen et al. method is better.