Saturday, May 28, 2016

R in Jupyter

First I had to install Macports for El Capitan:

https://www.macports.org/install.php


Then I followed the instrucions in: http://irkernel.github.io/requirements/

Using the "source" method.

New web page using Markdown

I have a web page hosted by Genotoul. I changed yesterday the ugly old one to a new one, much lighter. The main reason I changed is that I discovered how easy is to create html files using MarkDown with an appropriate text editor:

# This is a title

This is a text

* item1
* item2

Translates into

This is a title

This is a text
  • item1
  • item2


Sunday, May 8, 2016

Burguete horses and Pirenaica cattle

In Navarre: Burguete mare and foal, and Pirenaica cows.


Chisqueta sheep in Sobrarbe

Still a sheep breed that I heard about but never saw, the Chisqueta or Xisqueta sheep. Here are some references in aragonese and catalan, and here is one in Spanish.


Wednesday, May 4, 2016

Variance components with high numbers

I have this large data set in which I want to estimate heritability. These are the features for the trait:

nmiss   avg         min       max         var      
------------------------------------------------------------------------------

1703515 0 1772.570604    100.470000 7783.789000 524507.442442

So, the variance for the trait is very high. Blupf90 runs nicely. However, both airemlf90 and gibbs2f90 variance component programs in the suite gave me some trouble if I used. The reason is that the programs contain some "tolerance" parameter in the inverses of the updated variance components, and this seems to set some of the numbers to 0.

For instance, if divide the value of the performance by 10, I get this:


 * Start Gibbs iteration04-29-2016  16h 26m 59s 482
 round            1
 G
   768.75    
 G
   157.18    
 R
   0.0000    

 round            2

The reason for this is the scale and the large number of data. The sum of squares of the residual is
 ee=  0.16456E+10

when this is sent to the subroutine, its inverse is lower than 10^-9 and set to 0.

I finally standardized MY to a variance of 1. I put initial variances to $latex \sigma^2_g=\sigma^2_p=\sigma^2_e=1 $latex. After 1000 iterations of Gibbs, I get:

* End of iteration04-29-2016  12h 09m 52s 514
 ave G
  0.11236    
 SD G
  0.31784E-01
 ave G
  0.14050    
 SD G
  0.13210E-01
 ave R
  0.25338    
 SD R

  0.58904E-02

In this case I obtain correct estimates ($latex h^2 =0.22 $latex, repeatability of 0.5).

Standard errors of complex functions of variance components from REML Information matrix

In airemlf90 and other programs, a byproduct of the AIREML is the inverse of the Information Matrix that, asymptotically, describes the sampling covariance of the variance components. It can also be considered as an approximation to the posterior distribution of the parameters.

Getting standard errors of functions of the variance components is rather painful algebraically as one requires the delta method and Taylor series expansions. A much simpler method , implemented in as OPTION se_covar_function in airemlf90, comes through MonteCarlo sampling and was described by Meyer and Houle. Here I will show how this work.
This a model with three variance components, genetic, permanent and residual variance. Estimates from airemlf90 are in vector ahat:

#genetic, permanent, residual
ahat=c(
  0.11478,    
  0.13552,    
  0.25290,
  )

with AI matrix:

# inverse of AI matrix (Sampling Variance)
AI=matrix(c(
  0.16799E-05, -0.96486E-06, -0.82566E-08,
 -0.96486E-06,  0.96167E-06, -0.37113E-07,
 -0.82566E-08, -0.37113E-07,  0.10864E-06)

 ,ncol=3)

asymptotically, the "true" value has a sampling (or posterior) distribution a~N(ahat,AI), and getting samples from this distribution is easy:

require(MASS)
b=mvrnorm(10000,ahat,AI)


> head(b)
          [,1]      [,2]      [,3]
[1,] 0.1146738 0.1357640 0.2529399
[2,] 0.1163889 0.1342926 0.2528479
[3,] 0.1166155 0.1344342 0.2525161
[4,] 0.1142085 0.1358928 0.2534974
[5,] 0.1136835 0.1361108 0.2530133
[6,] 0.1140485 0.1365707 0.2530573

and from here, we can obtain samples of the heritability and its standard deviation:

h2=b[,1]/(b[,1]+b[,2]+b[,3])
sd(h2)

> 0.002318198

This is very useful, i.e. for complex model such as consideration of genomic and dominance variation.