Sunday, January 3, 2021

Paradoxes of BLUP

In genetic evaluation, it is well understood that an individual with progeny records has a more accurate prediction than an animal with few (or no) progeny records. Trying to illustrate the case with a small  example we found out some apparent paradox due to confounding with fixed effects. 

The pedigree is simple, two full-sibs (7 and 8) where 7 has seven offspring. So we hope animal 7 to be more accurate than animal 8/ All animals have a single record.

The model is also simple, $latex y =  u +\mu +e $latex with $latex h^2=0.5 $latex. We can compute accuracies directly in blupf90 using 

OPTION store_accuracy 1 

(store_accuracy 1 because it is the first effect in the model). Reliabilities are stored in file acc_bf90 . The result is surprising:

1 0.42468738

2 0.42468738

3 0.42468738

4 0.42468738

5 0.39977264

6 0.39977264

7 0.28647215

8 0.34482759

9 0.36994532

...


so, animal 7 is the less reliable in spite of having most progeny. Why is that? The reason is that animal 7 is highly confounded with the mean fit in the model through its descendants. If we remove the mean from the model and we fit $latex y =  u +e $latex we obtain:

1 0.54197995

2 0.54197995

3 0.54197995

4 0.54197995

5 0.62781955

6 0.62781955

7 0.71052632

8 0.59649123

9 0.54779806

here, we obtain what we expected, animal 7 is the most reliable. However the model $latex y =  u +e $latex is wrong: there *should* be an overall mean in the model.

The morale is that progeny of individuals should be spread across levels of fixed effects, and cross-classified as much as possible (different sires in different herds) in order to ensure good accuracies. Alternatively, if sires are not well spread, contemporary groups should be random although this is against common practice (see Larry Schaeffer's advice about this).




Saturday, January 2, 2021

Genetic variance of a population with metafounders


A model with metafounders is similar to an animal model but the scale of the genetic variances is shifted, in particular if there are several metafounders. The model with metafounders accounts for the fact that crossing two animals with different origins may result in a higher variance than either of the origins.

Much of the theory comes from Legarra et al. 2015 (Genetics). The genetic variance component associated with metafounders is called $latex \sigma_{\text{related}}^{2}$latex  in the paper, not the same as $latex \sigma_{\text{unrelated}}^{2}$latex  used in "regular" evaluations and to make it simple, in that paper we propose to obtain one as a function of the other:

$latex \sigma_{\text{related}}^{2} = \sigma_{\text{unrelated}}^{2}/k $latex 

Scale $latex k$latex  is lower than 1, resulting in

$latex \sigma_{\text{related}}^{2} > \sigma_{\text{unrelated}}^{2} $latex .

For the case of a single metafounder this is very simple and it is actually exact:

$latex k = 1 - \frac{\gamma}{2} $latex 

And it works very well in the little experiences that we made.

For multiple origins we proposed in the paper that, in practice, the population under study is a mixture in equal proportion of all origins (this is approximately true in single-breed sheep ) resulting in the expression

$latex k = 1 + \frac{\overline{\text{diag}\left( \mathbf{\Gamma} \right)}}{2} - \overline{\mathbf{\Gamma}} $latex 

Practicalities in blupf90

These two expressions are the default in blupf90test and blupf90iod2, and the way it works is that $latex k$latex  is computed internally and genetic variances modified internally. This can be avoided using OPTION lscale false and then the variance declared in the parameter file remains unchanged.

However, in some populations one metafounder (say number 1) is much more represented than the other one. Then it makes sense to use e.g.

$latex k = 1 - \frac{\Gamma_{\left( 1,1 \right)}}{2}$latex 

, but this possibility has not been programmed. The workaround for blupf90test and blup90iod2 is to scale genetic variances manually, e.g. if the genetic covariance matrix is 

$latex G_{0} = \begin{pmatrix} 10 & 2 \\ 2 & 4 \\ \end{pmatrix} $latex 

and the scale factor is, to give an example, $latex k = 0.85$latex  (e.g. see later) then we must do two things:

1.  put in the parameter file 

$latex \begin{pmatrix}    11.76 & 2.35 \\    2.35 & 4.71 \\    \end{pmatrix}$latex  

, which is equal to 

$latex G_{0}/k = \begin{pmatrix}    10 & 2 \\    2 & 4 \\    \end{pmatrix}/0.85 $latex 

2.  put OPTION lscale false

An example. Assume a breed where most all animals come from in-country (original) and a few are from another country. We have 

$latex \Gamma = \begin{pmatrix} 0.49 & 0.38 \\ 0.38 & 0.57 \\ \end{pmatrix} $latex 

 and using the default scaling that we programmed in blupf90 results in $latex k = 0.81$latex . However, if we consider that all individuals are in practice "original" we haven $latex k = 1 - \frac{0.49}{2} = 0.76 $latex . So I would use this number instead.

A possibility is a weighted mean of elements of $latex \Gamma $latex . But the problem is, first, that we need to weight the values in the diagonal and in the off-diagonal of $latex \mathbf{\Gamma}$latex  using and argument like in page 461 of the metafounders paper (right column, top), which assumes that all metafounders contribute equally (the algebra needs to be derived but it should be easy). Still this is not completely correct, because the number of founders related to each metafounder is not necessarily equal to their contribution to the overall population. It could happen that "external" are only 1% of the founders but they contribute to 10% of the overall populations.


How to obtain scaling factor in complex crosses

Now we put a sketch of a more complete theory. If we have a whole relationship matrix across animals (excluding metafounders) say D, then the variance of an animal sampled at random is on expectation  $latex s^{2} = \overline{\text{diag}\left( D \right)} - \overline{D}$latex (Legarra, Theor. Pop. Biol. 2016).

In the regular relationship matrix $latex \mathbf{A}$latex , this variance is

$latex s^{2} = \left( \overline{\text{diag}\left( A \right)} - \overline{A} \right)\sigma_{\text{unrelated}}^{2} = \left( 1 + \overline{F} - \overline{A} \right)\sigma_{\text{unrelated}}^{2} \approx \left( 1 - \overline{F} \right)\sigma_{\text{unrelated}}^{2}$latex 

The last approximation is because inbreeding is half parents' relationships and assumes slow increase of inbreeding.

In the relationship matrix $latex \mathbf{A}^{\left( \mathbf{\Gamma} \right)}$latex this is

$latex s^{2} = \left( \overline{\text{diag}\left( \mathbf{A}^{\left( \mathbf{\Gamma} \right)} \right)} - \overline{\mathbf{A}^{\left( \mathbf{\Gamma} \right)}} \right)\sigma_{\text{related}}^{2} = \left( 1 + \overline{F^{\left( \Gamma \right)}} - \overline{\mathbf{A}^{\left( \mathbf{\Gamma} \right)}} \right)\sigma_{\text{related}}^{2} \approx \left( 1 - \overline{F^{\left( \Gamma \right)}} \right)\sigma_{\text{related}}^{2}$latex 

The approximation here $latex \overline{F^{\left( \Gamma \right)}} \approx \overline{A^{\left( \Gamma \right)}}/2 $latex is less clear !!


So then if we want to equate both $latex s^{2}$latex  and we use both approximations. We force

$latex \left( 1 - \overline{F^{\left( \Gamma \right)}} \right)\sigma_{\text{related}}^{2} = \left( 1 - \overline{F} \right)\sigma_{\text{unrelated}}^{2}$latex 

which results in

$latex \frac{\sigma_{\text{unrelated}}^{2}}{\sigma_{\text{related}}^{2}} = \frac{\left( 1 - \overline{F^{\left( \Gamma \right)}} \right)}{\left( 1 - \overline{F} \right)}$latex and

therefore

$latex k = \frac{\left( 1 - \overline{F^{\left( \Gamma \right)}} \right)}{\left( 1 - \overline{F} \right)}$latex .

We could equally use the less approximate expressions involving  $latex \overline{\mathbf{A}}$latex .

Potentially, this can be programmed in blupf90: the inbreeding $latex F^{\left( \Gamma \right)}$latex  is computed internally, and we could also compute $latex F $latex  by the same subroutine by calling it twice: one with $latex \mathbf{\Gamma} = \mathbf{0}$latex  and another with the true $latex \mathbf{\Gamma}$latex .

I show now that the equivalence that I propose here is approximately the same as $latex k = 1 - \frac{\gamma}{2}$latex  for a single metafounder. We have that 

$latex A^{\left( \Gamma \right)} = A\left( 1 - \frac{\gamma}{2} \right) + \gamma $latex

which results in 

$latex F^{\left( \Gamma \right)} = F + \frac{\gamma}{2}$latex  

and therefore

$latex k = \frac{\left( 1 - \overline{F^{\left( \Gamma \right)}} \right)}{\left( 1 - \overline{F} \right)} = \frac{1 - \overline{F} - \frac{\gamma}{2}}{1 - \overline{F}} = 1 - \frac{\gamma}{2\left( 1 - \overline{F} \right)} $latex 

Which is almost the same as $latex k = 1 - \frac{\gamma}{2}$latex .

All these ideas are *approximations* because at the end, the "classical" A and the $latex A^{\Gamma} $latex are too different models.

I think that

$latex k = 1 + \frac{\overline{\text{diag}\left( \mathbf{\Gamma} \right)}}{2} - \overline{\mathbf{\Gamma}}$latex 

is good for UPGs in cattle/sheep for pure breed if they are well defined, and I believe that

$latex k = \frac{\left( 1 - \overline{F^{\left( \Gamma \right)}} \right)}{\left( 1 - \overline{F} \right)}$latex 

should work well in cases of strong unbalance among UPGs. But this needs to be verified with real data.