Thursday, February 25, 2016

Downloads in Thunderbird in Mac

If you (instead of saving them) open by double click attached files in Thunderbird, they are stored by default stored on the Desktop. This is an annoying bug . I read this workaround and I am doing something even simpler.

First, configure Thunderbird to choose as download folder /private/tmp :





This is actually tricky as /private/tmp is a folder you don't usually see. When you "choose" the folder, you hit "Command+shift+G" and in the resulting field you type /private/tmp:


Finally, configure it to "ask always" where to download (see 1st picture)

In this way, opened files by double click go to /private/tmp which is cleaned up periodically.

Saturday, February 20, 2016

GCMTBLUP

Here is an example of parameter files for GCMTBLUP, from the simulated example. Data looks like:
 (mean, animal, hys, y, gene content; missing value is -9999):

           1           1       14825       -9999           1
           1           2        5146       -9999           2
           1           3        5386       -9999           1
           1           4       47519       -9999           1
           1           5       46051       -9999           1
           1           6       43484       -9999           2
           1           7       12719       -9999           1
           1           8       27660       -9999           0
           1           9        5681       -9999           0
           1          10       21685       -9999           1
...
           1     9999993      480244   99.7522132400075            -9999
           1     9999994      461248   98.7729983384373            -9999
           1     9999995      460282   100.339403361134            -9999
           1     9999996      464308   101.923396214390            -9999
           1     9999997      498089   99.8367733957329            -9999
           1     9999998      494791   99.9258964576069            -9999


In blupf90:

# this models y= hys + u +e 
# snp = mu + u +e
# correlated as in GCMTBLUP
# # no residual variance !! hence var(e) is fixed to a small value
DATAFILE
data
NUMBER_OF_TRAITS
           2
NUMBER_OF_EFFECTS
           3
OBSERVATION(S)
    4 5
WEIGHT(S)

EFFECTS: POSITIONS_IN_DATAFILE NUMBER_OF_LEVELS TYPE_OF_EFFECT[EFFECT NESTED]
  0 1 1      cross
  3 0 500000 cross
  2 2 10000000 cross
RANDOM_RESIDUAL VALUES
.95 0
0  .01
 RANDOM_GROUP
     3
 RANDOM_TYPE
 add_animal
 FILE
ped 
(CO)VARIANCES
.05 .11
.11  .5
OPTION missing -9999


 In PEST, the parameter file is:

relationship
    rel_for animal
    disk
    infile = 'ped'
       input
        animal 
        m_p    
        f_p   
TRANSFORMATION
       treated_as_missing
           gc      none -9999 none
           ls      none -9999 none
data
    infile = 'data2'
    disk
     input
       mean 1
       animal   10000000
       hys 500000
       ls 0
       gc 0

model
       ls = hys animal
       gc = mean animal

VE
      0.95 0
      0  0.01
VG  
   vg_for animal
      0.05 0.11
      0.11 0.50

solver
    iod [stop = 0.001, relax = .7, max_iter = 2000
    STAND_AVG_CHANGE 
printout
   outfile = 'vlistp20'

Friday, February 19, 2016

test variance components from reml(f90)

A way of testing heritability is using likelihood ratio test. Fit two models in REML, one with genetic effect genetic variance (H1) and another one without genetic effect (H0). Then the difference between the two , which is a positive number, is a LRT statistic which follows a mixture of chi-2 distributions as detailed by Visscher "A Note on the Asymptotic Distribution of Likelihood Ratio Tests to Test Variance Components". The theory of the LRT can be found in standard books and a nice description is in Sorensen & Gianola book.

Output of (ai)reml(f90) under H0 is x=-2logL ; output under H1 is y=-2logL. These are positive numbers, so the smaller the better;  y<x as H1 is a more complex (so more likely) model
The LRT test is x-y

then, the pvalue is (in R) :
pchisq(x-y,1,lower.tail=FALSE)/2  

An application for association analysis for a multi allelic gene is http://dx.doi.org/10.3168/jds.2013-6570.


Example, 

test genetic effect in an old data set from dairy sheep (80,000 records, 50,000 animals in pedigree) with a model including permanent, genetic and residual variances.

 Under the complete model H1 we have:

-2logL =        825227.48

Under the reduced model H0 with no genetic effect we have:

 -2logL =    828095.614456413 

then the test in R is:
> pchisq(828095.614456413-825227.48,1,lower.tail=FALSE)/2

[1] 0
the p-value is so small that we cannot see the difference from 0.  We can take the log, then transform to a scale of -log10(pvalue), a scale typically used in GWAS:

> pval=pchisq(828095.614456413-825227.48,1,lower.tail=FALSE,log.p=TRUE)/2
> pval
[1] -719.137
> -log10(exp(pval))
[1] 312.3172

So, evidence against H0 is very strong and we should accept that there is genetic variance in this data set. 

Then, I test a dummy effect generated at random with 10000 levels.  AIREMLf90 gives:


 -2logL =    825253.459472054
> pval=pchisq(825253.459472054-825227.48,1,lower.tail=FALSE)/2
> pval
[1] 1.725335e-07

still, this is highly significant even if the effect has no real effect. 

If I only use 10% of the data (7800 records), I get
  • H0: -2logL =    72668.5945818828
  • H1: -2logL =    72668.5953989582
So, the p-value is 0.49:

pval=pchisq(72668.5953989582-72668.5945818828,1,lower.tail=FALSE)/2

and we do not reject the null hypothesis, i.e. we keep the simpler model with no dummy effect. It is known that the LRT favours asymptotically the more complex models, and this may be the case.

Wednesday, February 17, 2016

ms2gs

Using ms2gs

in the internal documentation, miguel says that one may run before


macs 200 100000 -i 3 -t .002 -r .001 -h 0.01 2>/dev/null | msformatter 2>/dev/null 

Actually, this outputs the haplotypes to std output. Correct command should be:


macs 200 100000 -i 3 -t .002 -r .001 -h 0.01 2>/dev/null | msformatter 2>/dev/null >haplo

Then, in the call to ms2gs one should specify the ms file haplo: -ms haplo.

Tuesday, February 16, 2016

Copy and paste graphs from R in Mac

Para copiar y pegar graficas de "R en Windows" a "Word en Windows" trabajando en la Mac con Parallels hay que usar “coherence”, si no hay problemas con el redimensionado automatico en pantalla completa.


Otra manera desde Mac es usando 

require(devEMF)
emf()
dev.off()

que crea un fichero emf que se puede copiar en "Word en Windows". 


Word 2016 para Mac parece aceptar bien gráficas desde R for Mac OS X GUI . For this blog, copy and paste fro m R is not working in Safari or Chrome :-(


Dragging seems to work well in Chrome, as shown below (wild orchidea in Castanet Tolosan).




A twisted but quick way is to paste the image in Preview, then save as png, then drag here:




installing Gary K. Cheng's macs in Mac

I want to install macs  in my Mac running Yosemite. It would not compile using make as I lack the boost library . Following advice on the web,  I install macports (using the Yosemite pkg), then

sudo port -v selfupdate
sudo port install libxslt docbook-xsl docbook-xml-4.2

Does not work (the compiler does not find the library) and I cannot find any  boost library using find. My feeling is that the library is not even installed. Then I find this page so I will try it:

  1. In the directory where you want to put the Boost installation, execute
    tar --bzip2 -xf /path/to/boost_1_60_0.tar.bz2

Then I modify the makefile to have
LIB = -I /Users/andres/save/boost_1_60_0


and it does work, so there was no point in installing macports or boost itself... does not matter.