Wednesday, December 15, 2021

A proof on linear mixed models

In Searle's (1979) notes and in many other places one can find that

$latex (y-X\hat{\beta})'V^{-1} (y-X\hat{\beta})  = y'R^{-1}(y - X\hat{\beta} - Z \hat{u}) $latex

The typical proof is based on projection matrices $latex S $latex but I wanted something simpler. An easier proof is composed of two parts. 


The first part is to show that

$latex (y-X\hat{\beta})'V^{-1} (y-X\hat{\beta}) = yV^{-1} (y-X\hat{\beta}) $latex

which is proven as follows:


$latex (y-X\hat{\beta})'V^{-1} (y-X\hat{\beta}) = y'V^{-1} (y-X\hat{\beta})  + \hat{\beta'}X' V^{-1} (y-X\hat{\beta})  = y'V^{-1} (y-X\hat{\beta})$latex

This is because 

$latex \hat{\beta'}X' V^{-1} (y-X\hat{\beta}) = \hat{\beta'}(X' V^{-1} y-X' V^{-1} X\hat{\beta}) = \hat{\beta'} 0 $latex  

because, of course $latex X' V^{-1} y=X' V^{-1} X\hat{\beta} $latex . It is kind of obvious only after you see it (in Searle et al., Variance Components, p 278).

The second part shows that $latex y'V^{-1} (y-X\hat{\beta})  = y'R^{-1}(y - X\hat{\beta} - Z \hat{u}) $latex . This is done by replacing $latex \hat{u} $latex by its estimation as $latex  \hat{u}=GZ'(y - X\hat{\beta}) $latex, and using that $latex V = ZGZ'+R $latex as follows:

$latex y'R^{-1}(y - X\hat{\beta} - Z \hat{u})= y'R^{-1}(y - X\hat{\beta} - Z GZ'V^{-1}(y - X\hat{\beta})) =  y'R^{-1}(y - X\hat{\beta} - (V-R) V^{-1}(y - X\hat{\beta}))   = y'R^{-1}(y - X\hat{\beta} - y - X\hat{\beta} + R V^{-1} (y - X\hat{\beta})  = y' V^{-1} (y - X\hat{\beta}) $latex



Friday, November 19, 2021

Standard error of Maximum Likelihood estimates using curvature of grid search

We have done a grid search for a genetic correlation, from 0 to 1 in steps of 0.1. The other variance components were fixed and only the covariance was varied. This was done making calls to remlf90 with the variance components and OPTION maxrounds 0 . In this manner -2logL is obtained. Technically this gives a so-called profile likelihood. The resulting profile is as below:
 

So the REML estimate is 0.7, but what can we say about the accuracy of this estimate? This, we solve today here :-)

REML estimation software (VCE, Wombat, AIREMLF90) usually yields s.e. of estimates of variance components and their function such as heritabilities. Good accounts on the theory can be read in 

(statistical basis)

Casella, G., & Berger, R. L. (2021). Statistical inference. Chapter 10

Sorensen, D., Gianola, D., & Gianola, D. (2002). Likelihood, Bayesian and MCMC methods in quantitative genetics. (3.26)

(application in REML software)

Meyer, K., & Smith, S. P. (1996). Restricted maximum likelihood estimation for animal models using derivatives of the likelihood. Genetics Selection Evolution28(1), 23-49.

Meyer, K., & Houle, D. (2013, October). Sampling based approximation of confidence intervals for functions of genetic covariance matrices. In Proc. Assoc. Advmt. Anim. Breed. Genet (Vol. 20, pp. 523-526).

However, our problem is quite simple. The profile likelihood looks a bit like a parable. The information matrix is minus the expected 2nd derivative of the log-likelihood. To obtain this, we want a function $latex logL=f(r_g)$latex. So, we do an approximation to this function by fitting  a quadratic regression $latex logL = a + b r_g +c r_g^2 $latex to the points above. In R (theta= rg throughout):


cbind(theta,minus2logL)
#      theta minus2logL
# [1,]  0.10    7172238
# [2,]  0.20    7172210
# [3,]  0.30    7172186
# [4,]  0.40    7172167
# [5,]  0.50    7172152
# [6,]  0.60    7172141
# [7,]  0.70    7172137
# [8,]  0.80    7172140
# [9,]  0.90    7172158
#[10,]  0.99    7172223

res=lm(logL ~ theta + I(theta^2))
# Coefficients:
#   (Intercept)        theta   I(theta^2)  
# -1.249e+07    8.040e+02   -6.481e+02  

Then from here, I need the 2nd derivative, but this is very simple: $latex \frac{dlogL}{d\theta^2}=2c $latex


 Again, in R 

# the 2nd derivative of this quadratic is 2c
# for the information matrix (actually a single coefficient)
# I need to pre-multiply by -1 (by definition)
info=-2*res$coefficients[3]

So, info is the information matrix. Now,  the asymptotic variance is the inverse of the Information matrix  (for instance, Sorensen and Gianola, 3.66) so that (informally as they say):

$latex \hat{\theta}_n \sim N(\theta_0,I^{-1}(\theta_0)) $latex 

where I is the information matrix ("info" above). So we invert and then just take the square root to get the s.d.:

# now we invert this matrix
infoi=solve(info)
# this is the asymptotic variance of the parameter. 
# so the s.d. is the sq root:
sdasymp=sqrt(infoi)
cat("s.d. of the parameter= ",sdasymp,"\n")
#s.d. of the parameter=  0.04898895 



Thursday, September 9, 2021

Bayesian interpretation of reliabilities

 


VanVleck (1979 and probably the Green Book) shows that (in our notation) $latex Var(u|\hat{u} )=(1-rel)Var(u) $latex. However,  $latex Var(u|\hat{u})$latex  is the same as $latex Var(u|y)$latex . So we can put:

$latex 1-rel=\frac{Var(u│y)}{Var(u)} $latex

And thus 

$latex rel=1- \frac{Var(u│y)}{Var(u)} = 1- \frac{PEV}{Var(u)} $latex

which is the usual equation. So, reliability is 1 minus the ratio of reduction from prior to posterior variance. It is an inbteresting Bayesian interpretation of which I was not aware.

Thursday, May 6, 2021

Fst as heterozygosities of two populations and their crosses

Among the several definitions and interpretations of $latex F_{ST}$latex , I like the paper of Bhatia et al. 2013 because it summarizes them well. One of them, that I like, is Hudson's definition (and estimation) of $latex F_{ST}$latex , which for two population turns out to be the same as Weir-Hill and Weir-Cockerham. I found useful to "translate" into animal breeding jargon and the variances of two purebreds and F1 and F2 crosses. This has probably been done elsewhere. In fact this is largely based on Bhatia et al.'s explanations and Appendix. I will use $latex q=1-p$latex  often. I assume large populations of the same size.

Hudson's definition

Consider populations 1 and 2. Let 

$latex F_{ST}=1-\frac{H_w}{H_b}=\frac{H_b - H_w}{H_b}$latex 

 where $latex H_w=p_1 (1-p_1)+p_2 (1-p_2)$latex  is heterozygosity "within" and $latex H_b=p_1 (1-p_2)+p_2 (1-p_1)$latex  is heterozygosity "between". What does this mean?

  •  $latex H_w=p_1 (1-p_1)+p_2 (1-p_2)$latex  the heterozygosity "within" can be thought as the average heterozygosity across the two populations: $latex H_w= \frac{H_1 + H_2}{2} = \frac{1}{2}\left(2 p_1 (1-p_1)+2 p_2 (1-p_2)\right)=p_1 (1-p_1)+p_2 (1-p_2)$latex 

  •  $latex H_b=p_1 (1-p_2)+p_2 (1-p_1)$latex  is the heterozygosity of an F1 population with one gamete coming from population 1 and the other from population 2.

The difference $latex H_b - H_w$latex  is the numerator of the $latex F_{ST}$latex  and is:

$latex H_b - H_w= p_1 q_1 + p_2 q_2 - p_1 q_2 + p_2 q_1 = (p_1 - p_2)^2 $latex 

where $latex (p_1 - p_2)^2=N$latex  is the numerator of the $latex F_{ST}$latex  and it is Nei's minimal genetic distance.

In an F2 population there is HW equilibrium and the allele frequency is $latex p_{F2}=\frac{p_1 + p_2}{2}$latex . Thus the heterozygosity is  $latex H_{F2}=2 \frac{p_1 + p_2}{2}\frac{q_1 + q_2}{2}=\frac{1}{2}(p_1 + p_2)(q_1 + q_2)$latex .

The increased variance in an F2 population from the average variance across the two populations is  

$latex H_{F2} - H_w=\frac{1}{2}(p_1 + p_2)(q_1 + q_2) - (p_1 q_1 + p_2 q_2) = \frac{1}{2}(p_1 q_2 + p_2 q_1 - p_1 q_1 - p_2 q_2)=\frac{1}{2}(p_1 - p_2)^2 $latex 

which is half the difference $latex H_b - H_w$latex , thus $latex H_{F2} - H_w=\frac{1}{2}(H_b - H_w) $latex . 

The segregation variance is the difference between heterozygosities in the F1 and in the F2. From before $latex H_{F2} =\frac{H_b}{2}+\frac{H_w}{2}$latex  and it is 

$latex H_{F2} - H_b =  \frac{H_b}{2}+\frac{H_w}{2} - H_b = \frac{H_w}{2} - \frac{H_b}{2}=\frac{1}{2}(H_w - H_b)= \frac{1}{2}(p_1 - p_2)^2 = \frac{N}{2} $latex 

Thus, when we move from two populations to an F1 we gain say $latex \Delta H$latex  in genetic variance and we reproduce the F1 to create an F2 we loss (from the F1) $latex \frac{\Delta H}{2}$latex  and we gain (from the purebreds) $latex \frac{\Delta H}{2}$latex . The (numerator of the) $latex F_{ST}$latex  is a measure of this. More exactly, the $latex F_{ST}$latex  explains how much of the variance of the (hypothetical) F1 population is due to mixing populations and not to the variance within populations (this is of course Wright's original interpretation).


Nei's definition

We will call it  $latex F_{ST}^{Nei}$latex . It is defined as

$latex F_{ST}^{Nei}=\frac{(p_1 - p_2)^2}{2\bar{p}(1-\bar{p})}$latex  

where, because $latex \bar{p}=\frac{p_1 + p_2}{2}=p_{F2}$latex ,  the denominator is exactly  the heterozygosity in our F2 population. Thus the $latex F_{ST}^{Nei}$latex  and Hudson's  $latex F_{ST}$latex  are not the same thing,  because the denominator refer to a different "common" population, an F1 population for Hudsons and WC and an F2 for Nei. Bhatia et al. show that on expectation and in the limit

$latex F_{ST}^{Nei} \rightarrow \frac{F_{ST}^1+F_{ST}^2}{2-\frac{F_{ST}^1+F_{ST}^2}{2}}$latex  but in fact, Hudson's $latex F_{ST}=\frac{F_{ST}^1+F_{ST}^2}{2}$latex  which after some manipulations gives

$latex F_{ST}^{Nei}=\frac{F_{ST}}{1-\frac{F_{ST}}{2}}$latex 

so, Nei's $latex F_{ST}^{Nei}$latex  (very slightly for small values) understimates Hudson's (or Weir-Cockerham, Weir-Hill) $latex F_{ST}$latex , but this is normal, strictly speaking they're not the same thing. 

References


Bhatia, G., Patterson, N., Sankararaman, S., & Price, A. L. (2013). Estimating and interpreting FST: the impact of rare variants. Genome research23(9), 1514-1521.
Hudson, R. R., Slatkin, M., & Maddison, W. P. (1992). Estimation of levels of gene flow from DNA sequence data. Genetics132(2), 583-589.
Nei, M. (1987). Molecular evolutionary genetics. Columbia university press.
Weir, B. S., & Cockerham, C. C. (1984). Estimating F-statistics for the analysis of population structure. evolution, 1358-1370.
Weir, B. S., & Hill, W. G. (2002). Estimating F-statistics. Annual review of genetics36(1), 721-750.

Thursday, March 4, 2021

Scaling traits and covariances.



 

Scaling traits for numerical analyses and de-scaling results such as variance component estimates can be a headache. Here it is how to proceed in a systematic manner.

Assume that we have a trait in a scale of very large variance (milk yield in liters) and other traits in a very small scale (points). We want to scale the traits so that the algorithms are numerically stable. So for a single record we have, say, 7 traits.

We transform each row of record, y,  pre-multiplying by a matrix with a scale factor, S. For instance S can contain 1/10 for milk yield (assume that this is the first trait) and 1 otherwise. Or it can contain the inverse of the phenotypic standard deviation of each trait. Scaled records y^s that we fed to, say, airemlf90 is 

$latex \bf{y}^s=\bf{Sy} $latex

Then the variances are scaled such that $latex Var({\bf y}^s )={\bf S}Var({\bf y}){\bf S}' $latex. If the original variances of y are, for instance for the genetic component, $latex {\bf G}_0 $latex, they are scaled such that $latex {\bf G}^s_0 $latex =$latex {\bf S G_0 S} $latex'. For instance 


G0 = [ 10 2 3
       2 4 0
       3 0 5]
S=[1/10 0 0
   0 1 0
   0 0 1]

Then 

G0s=S*G0*S’
3×3 Array{Float64,2}:
 0.1  0.2  0.3
 0.2  4.0  0.0
 0.3  0.0  5.0

The 1st col and  row go multiplied by 10 so the [1,1] element is multiplied by 100.

To transform back from, say, REML estimates of $latex \bf G_0^s $latex we multiply by the inverse of S:

$latex \bf \hat{G}_0=S^{-1} \hat{G}_0^s (S' )^{-1} $latex
For instance

inv(S)*G0s*inv(S’)
3×3 Array{Float64,2}:
 10.0  2.0  3.0
  2.0  4.0  0.0
  3.0  0.0  5.0

In the particular case of genetic correlation and heritabilities, they are invariant to the transformation. This is because they are multiplied by the same numbers in the numerator and the denominator. 






Sunday, January 3, 2021

Paradoxes of BLUP

In genetic evaluation, it is well understood that an individual with progeny records has a more accurate prediction than an animal with few (or no) progeny records. Trying to illustrate the case with a small  example we found out some apparent paradox due to confounding with fixed effects. 

The pedigree is simple, two full-sibs (7 and 8) where 7 has seven offspring. So we hope animal 7 to be more accurate than animal 8/ All animals have a single record.

The model is also simple, $latex y =  u +\mu +e $latex with $latex h^2=0.5 $latex. We can compute accuracies directly in blupf90 using 

OPTION store_accuracy 1 

(store_accuracy 1 because it is the first effect in the model). Reliabilities are stored in file acc_bf90 . The result is surprising:

1 0.42468738

2 0.42468738

3 0.42468738

4 0.42468738

5 0.39977264

6 0.39977264

7 0.28647215

8 0.34482759

9 0.36994532

...


so, animal 7 is the less reliable in spite of having most progeny. Why is that? The reason is that animal 7 is highly confounded with the mean fit in the model through its descendants. If we remove the mean from the model and we fit $latex y =  u +e $latex we obtain:

1 0.54197995

2 0.54197995

3 0.54197995

4 0.54197995

5 0.62781955

6 0.62781955

7 0.71052632

8 0.59649123

9 0.54779806

here, we obtain what we expected, animal 7 is the most reliable. However the model $latex y =  u +e $latex is wrong: there *should* be an overall mean in the model.

The morale is that progeny of individuals should be spread across levels of fixed effects, and cross-classified as much as possible (different sires in different herds) in order to ensure good accuracies. Alternatively, if sires are not well spread, contemporary groups should be random although this is against common practice (see Larry Schaeffer's advice about this).




Saturday, January 2, 2021

Genetic variance of a population with metafounders


A model with metafounders is similar to an animal model but the scale of the genetic variances is shifted, in particular if there are several metafounders. The model with metafounders accounts for the fact that crossing two animals with different origins may result in a higher variance than either of the origins.

Much of the theory comes from Legarra et al. 2015 (Genetics). The genetic variance component associated with metafounders is called $latex \sigma_{\text{related}}^{2}$latex  in the paper, not the same as $latex \sigma_{\text{unrelated}}^{2}$latex  used in "regular" evaluations and to make it simple, in that paper we propose to obtain one as a function of the other:

$latex \sigma_{\text{related}}^{2} = \sigma_{\text{unrelated}}^{2}/k $latex 

Scale $latex k$latex  is lower than 1, resulting in

$latex \sigma_{\text{related}}^{2} > \sigma_{\text{unrelated}}^{2} $latex .

For the case of a single metafounder this is very simple and it is actually exact:

$latex k = 1 - \frac{\gamma}{2} $latex 

And it works very well in the little experiences that we made.

For multiple origins we proposed in the paper that, in practice, the population under study is a mixture in equal proportion of all origins (this is approximately true in single-breed sheep ) resulting in the expression

$latex k = 1 + \frac{\overline{\text{diag}\left( \mathbf{\Gamma} \right)}}{2} - \overline{\mathbf{\Gamma}} $latex 

Practicalities in blupf90

These two expressions are the default in blupf90test and blupf90iod2, and the way it works is that $latex k$latex  is computed internally and genetic variances modified internally. This can be avoided using OPTION lscale false and then the variance declared in the parameter file remains unchanged.

However, in some populations one metafounder (say number 1) is much more represented than the other one. Then it makes sense to use e.g.

$latex k = 1 - \frac{\Gamma_{\left( 1,1 \right)}}{2}$latex 

, but this possibility has not been programmed. The workaround for blupf90test and blup90iod2 is to scale genetic variances manually, e.g. if the genetic covariance matrix is 

$latex G_{0} = \begin{pmatrix} 10 & 2 \\ 2 & 4 \\ \end{pmatrix} $latex 

and the scale factor is, to give an example, $latex k = 0.85$latex  (e.g. see later) then we must do two things:

1.  put in the parameter file 

$latex \begin{pmatrix}    11.76 & 2.35 \\    2.35 & 4.71 \\    \end{pmatrix}$latex  

, which is equal to 

$latex G_{0}/k = \begin{pmatrix}    10 & 2 \\    2 & 4 \\    \end{pmatrix}/0.85 $latex 

2.  put OPTION lscale false

An example. Assume a breed where most all animals come from in-country (original) and a few are from another country. We have 

$latex \Gamma = \begin{pmatrix} 0.49 & 0.38 \\ 0.38 & 0.57 \\ \end{pmatrix} $latex 

 and using the default scaling that we programmed in blupf90 results in $latex k = 0.81$latex . However, if we consider that all individuals are in practice "original" we haven $latex k = 1 - \frac{0.49}{2} = 0.76 $latex . So I would use this number instead.

A possibility is a weighted mean of elements of $latex \Gamma $latex . But the problem is, first, that we need to weight the values in the diagonal and in the off-diagonal of $latex \mathbf{\Gamma}$latex  using and argument like in page 461 of the metafounders paper (right column, top), which assumes that all metafounders contribute equally (the algebra needs to be derived but it should be easy). Still this is not completely correct, because the number of founders related to each metafounder is not necessarily equal to their contribution to the overall population. It could happen that "external" are only 1% of the founders but they contribute to 10% of the overall populations.


How to obtain scaling factor in complex crosses

Now we put a sketch of a more complete theory. If we have a whole relationship matrix across animals (excluding metafounders) say D, then the variance of an animal sampled at random is on expectation  $latex s^{2} = \overline{\text{diag}\left( D \right)} - \overline{D}$latex (Legarra, Theor. Pop. Biol. 2016).

In the regular relationship matrix $latex \mathbf{A}$latex , this variance is

$latex s^{2} = \left( \overline{\text{diag}\left( A \right)} - \overline{A} \right)\sigma_{\text{unrelated}}^{2} = \left( 1 + \overline{F} - \overline{A} \right)\sigma_{\text{unrelated}}^{2} \approx \left( 1 - \overline{F} \right)\sigma_{\text{unrelated}}^{2}$latex 

The last approximation is because inbreeding is half parents' relationships and assumes slow increase of inbreeding.

In the relationship matrix $latex \mathbf{A}^{\left( \mathbf{\Gamma} \right)}$latex this is

$latex s^{2} = \left( \overline{\text{diag}\left( \mathbf{A}^{\left( \mathbf{\Gamma} \right)} \right)} - \overline{\mathbf{A}^{\left( \mathbf{\Gamma} \right)}} \right)\sigma_{\text{related}}^{2} = \left( 1 + \overline{F^{\left( \Gamma \right)}} - \overline{\mathbf{A}^{\left( \mathbf{\Gamma} \right)}} \right)\sigma_{\text{related}}^{2} \approx \left( 1 - \overline{F^{\left( \Gamma \right)}} \right)\sigma_{\text{related}}^{2}$latex 

The approximation here $latex \overline{F^{\left( \Gamma \right)}} \approx \overline{A^{\left( \Gamma \right)}}/2 $latex is less clear !!


So then if we want to equate both $latex s^{2}$latex  and we use both approximations. We force

$latex \left( 1 - \overline{F^{\left( \Gamma \right)}} \right)\sigma_{\text{related}}^{2} = \left( 1 - \overline{F} \right)\sigma_{\text{unrelated}}^{2}$latex 

which results in

$latex \frac{\sigma_{\text{unrelated}}^{2}}{\sigma_{\text{related}}^{2}} = \frac{\left( 1 - \overline{F^{\left( \Gamma \right)}} \right)}{\left( 1 - \overline{F} \right)}$latex and

therefore

$latex k = \frac{\left( 1 - \overline{F^{\left( \Gamma \right)}} \right)}{\left( 1 - \overline{F} \right)}$latex .

We could equally use the less approximate expressions involving  $latex \overline{\mathbf{A}}$latex .

Potentially, this can be programmed in blupf90: the inbreeding $latex F^{\left( \Gamma \right)}$latex  is computed internally, and we could also compute $latex F $latex  by the same subroutine by calling it twice: one with $latex \mathbf{\Gamma} = \mathbf{0}$latex  and another with the true $latex \mathbf{\Gamma}$latex .

I show now that the equivalence that I propose here is approximately the same as $latex k = 1 - \frac{\gamma}{2}$latex  for a single metafounder. We have that 

$latex A^{\left( \Gamma \right)} = A\left( 1 - \frac{\gamma}{2} \right) + \gamma $latex

which results in 

$latex F^{\left( \Gamma \right)} = F + \frac{\gamma}{2}$latex  

and therefore

$latex k = \frac{\left( 1 - \overline{F^{\left( \Gamma \right)}} \right)}{\left( 1 - \overline{F} \right)} = \frac{1 - \overline{F} - \frac{\gamma}{2}}{1 - \overline{F}} = 1 - \frac{\gamma}{2\left( 1 - \overline{F} \right)} $latex 

Which is almost the same as $latex k = 1 - \frac{\gamma}{2}$latex .

All these ideas are *approximations* because at the end, the "classical" A and the $latex A^{\Gamma} $latex are too different models.

I think that

$latex k = 1 + \frac{\overline{\text{diag}\left( \mathbf{\Gamma} \right)}}{2} - \overline{\mathbf{\Gamma}}$latex 

is good for UPGs in cattle/sheep for pure breed if they are well defined, and I believe that

$latex k = \frac{\left( 1 - \overline{F^{\left( \Gamma \right)}} \right)}{\left( 1 - \overline{F} \right)}$latex 

should work well in cases of strong unbalance among UPGs. But this needs to be verified with real data.