Friday, February 19, 2016

test variance components from reml(f90)

A way of testing heritability is using likelihood ratio test. Fit two models in REML, one with genetic effect genetic variance (H1) and another one without genetic effect (H0). Then the difference between the two , which is a positive number, is a LRT statistic which follows a mixture of chi-2 distributions as detailed by Visscher "A Note on the Asymptotic Distribution of Likelihood Ratio Tests to Test Variance Components". The theory of the LRT can be found in standard books and a nice description is in Sorensen & Gianola book.

Output of (ai)reml(f90) under H0 is x=-2logL ; output under H1 is y=-2logL. These are positive numbers, so the smaller the better;  y<x as H1 is a more complex (so more likely) model
The LRT test is x-y

then, the pvalue is (in R) :
pchisq(x-y,1,lower.tail=FALSE)/2  

An application for association analysis for a multi allelic gene is http://dx.doi.org/10.3168/jds.2013-6570.


Example, 

test genetic effect in an old data set from dairy sheep (80,000 records, 50,000 animals in pedigree) with a model including permanent, genetic and residual variances.

 Under the complete model H1 we have:

-2logL =        825227.48

Under the reduced model H0 with no genetic effect we have:

 -2logL =    828095.614456413 

then the test in R is:
> pchisq(828095.614456413-825227.48,1,lower.tail=FALSE)/2

[1] 0
the p-value is so small that we cannot see the difference from 0.  We can take the log, then transform to a scale of -log10(pvalue), a scale typically used in GWAS:

> pval=pchisq(828095.614456413-825227.48,1,lower.tail=FALSE,log.p=TRUE)/2
> pval
[1] -719.137
> -log10(exp(pval))
[1] 312.3172

So, evidence against H0 is very strong and we should accept that there is genetic variance in this data set. 

Then, I test a dummy effect generated at random with 10000 levels.  AIREMLf90 gives:


 -2logL =    825253.459472054
> pval=pchisq(825253.459472054-825227.48,1,lower.tail=FALSE)/2
> pval
[1] 1.725335e-07

still, this is highly significant even if the effect has no real effect. 

If I only use 10% of the data (7800 records), I get
  • H0: -2logL =    72668.5945818828
  • H1: -2logL =    72668.5953989582
So, the p-value is 0.49:

pval=pchisq(72668.5953989582-72668.5945818828,1,lower.tail=FALSE)/2

and we do not reject the null hypothesis, i.e. we keep the simpler model with no dummy effect. It is known that the LRT favours asymptotically the more complex models, and this may be the case.

No comments:

Post a Comment