Saturday, November 19, 2016

Plotting correlations

I found this nice R package to plot correlations : corrplot . Imagine that you want to present five genetic correlations to your buddies. Here's how to visualize them nicely.

#create a covariance matrix
set.seed(1234)
V=rWishart(1,10,diag(5))[,,1]
cov2cor(V)
            [,1]       [,2]       [,3]       [,4]        [,5]
[1,]  1.00000000  0.1169636  0.2236465 -0.2667448  0.04351447
[2,]  0.11696358  1.0000000 -0.2260988 -0.3425368  0.64837326
[3,]  0.22364648 -0.2260988  1.0000000 -0.1883763 -0.22484960
[4,] -0.26674477 -0.3425368 -0.1883763  1.0000000 -0.50315007
[5,]  0.04351447  0.6483733 -0.2248496 -0.5031501  1.00000000
#give it names
colnames(V)=c("MY","gain","longevity","SCS","pietin")
rownames(V)=c("MY","gain","longevity","SCS","pietin")

#plot
require(corrplot)
corrplot.mixed(cor(V),col=gray.colors(10))


Thursday, October 27, 2016

French Keyboard on US Macbook Keyboard

I have a US-like keyboard with this key disposition


However, it happens to me to write in French changing keyboard preferences to French. The keys are mapped to a French keyboard.  I use a rubber cover from kbcovers.com, which is excellent and looks like this:


However, the physical layout of a true French macbok keyboard is slightly different. Also, and more impostant, some important characters such as | ~ # [ ] { } are not present, or difficult to see, in the cover.

So, I prepared my own cheatsheet:


Yes, it does look horrible and yes, it does work :-)

Tuesday, September 20, 2016

formatting SNPs using R or awk

In some software for genomic prediction (blupf90, GS3 and may be other) the genotypes shoukd be given in a plain text file as follows:

snp_file.txt

          25 1121022100
         600 0111220012
        1333 0110111111
           5 1120112102
          89 0111220001 


with no spaces between genotypes, id and genotypes separated by spaces - no tabs - and all genotypes starting at the same column. Sometimes it is not obvious how to get this format. Llibertat Tusell got a solution in R:

snps=sample(c(0,1,2),prob=c(.25,0.5,.25),size=50,replace=T)
X=matrix(ncol=10,nrow=5,snps)
animal=c(25,600,1333,5,89)

con <- file("snp_file.txt", "w")

for (i in 1:5){
  tmp=paste( X[i,] , collapse = "" )
  cat(format(animal[i],width=12),tmp,'\n',file=con)
}
close(con)


The R program collapses markers into a single string, then puts format to the animal "word" so that it has constant width, then it writes it to a file

Another solution is to use awk and start from a file, e.g.

$ cat exo_geno_spaces 
  1101 1 0 2 2 1 1 2
     101 2 2 1 1 2 1 1
   254 1 1 2 0 1 1 0

   255    2 1 2 2 0 1 2

Then you can use an awk program ./remove_spaces_snps.awk:

#! /opt/local/bin/gawk -f
# this script removes space between SNP genotypes
# and formatting as UGA
BEGIN{}
{
    # print animal
    printf( "%20s",$1)
    printf( "%1s"," ")

    for (i =2; i<=NF; i++){
      printf( "%1s",$i)
    }
    printf("\n")
 }
END{}

This awk program prints on stdout the animal (with constant width) then markers without sopace separation, then a newline.

$ ./remove_spaces_snps.awk exo_geno_spaces > out
$ cat out 

                1101 1022112
                 101 2211211
                 254 1120110

                 255 2122012

Monday, September 19, 2016

Livestock Fair: Pirenaica cattle

I was in a livestock fair in Irurtzun and I could take good pictures of local breeds. These are Pirenaica cows and calves. The Pirenaica breeding association is Conaspi and is has been the object of many scientific publications.





In the same village, there used to be a weekly livestock fair every tuesday until the 70's. In this old picture circa 1950 you can see some local animals (some of them Pirenaica) that were used, among other things, for working the fields. 






Thursday, September 15, 2016

Reordering matrix in R

I have this matrix with relationships across metafounders and years. However the matrix is disordered, let's say:

> g
     1998 2010 1970
1998 0.90 0.67 0.83
2010 0.67 0.92 0.52
1970 0.83 0.52 0.95

I would like to sort this matrix in ascending order, i.e. the cell corrresponding to [1970,1970] should go on top left.

This can be done using a loop but it is tricky. In R this is easy using sorted indices:

> g1=g[order(rownames(g)),order(colnames(g))]
> g1
     1970 1998 2010
1970 0.95 0.83 0.52
1998 0.83 0.90 0.67
2010 0.52 0.67 0.92



Tuesday, September 13, 2016

ifort options -openmp and -qopenmp

I work in three computers, two servers (let's call them grits and cassoulet) and my own Mac machine. Well, I found out that the options to use OpenMP in the Intel Fortran compiler (ifort) are different depending on the compiler version:

  • grits: ifort 15.0.3: -qopenmp
  • Mac: ifort 15.0.2: -qopenmp
but...

  • cassoulet: ifort 14.0.3: -openmp
According to the documentation for for ifort, the option -openmp is deprecated .


Friday, July 29, 2016

gawk in MobaXterm

MobaXterm is a nice unix/linux console emulator. We were trying to use it to run a script for genetic evaluation, yet in one of the awk scripts we used the sentence
 BEGIN {FIELDWIDTHS = "12 14 8 2 3 3 8 1 3 14 14 3 8 14" }

it seems that this FIELDWIDTHS is restricted to gawk, but not to all implementations of awk, in particular not to the one in MobaXterm. However MobaXterm has the possibility of installing programs and plugins. Thus I try

[andres.ANDRESLEGAR2422] → apt-get
apt-cyg: Installs and removes Cygwin packages.
  "apt-cyg install <package names>" to install packages

So, I install it

[andres.ANDRESLEGAR2422] → apt-get install gawk

Trying to download file setup.bz2
Updated setup.ini
Found package gawk
Downloading gawk-4.1.3-1.tar.xz...
Unpacking gawk-4.1.3-1.tar.xz...
Extracting dependencies for usr/bin/gawk.exe...
Extracting dependencies for usr/bin/gawk-4.1.3.exe...
Extracting dependencies for usr/libexec/awk/pwcat.exe...
Extracting dependencies for usr/libexec/awk/grcat.exe...
Extracting dependencies for usr/lib/gawk/filefuncs.dll...
Extracting dependencies for usr/lib/gawk/fnmatch.dll...
Extracting dependencies for usr/lib/gawk/fork.dll...
Extracting dependencies for usr/lib/gawk/inplace.dll...
Extracting dependencies for usr/lib/gawk/ordchr.dll...
Extracting dependencies for usr/lib/gawk/readdir.dll...
Extracting dependencies for usr/lib/gawk/readfile.dll...
Extracting dependencies for usr/lib/gawk/revoutput.dll...
Extracting dependencies for usr/lib/gawk/revtwoway.dll...
Extracting dependencies for usr/lib/gawk/rwarray.dll...
Extracting dependencies for usr/lib/gawk/testext.dll...
Extracting dependencies for usr/lib/gawk/time.dll...
Package gawk requires the following packages, installing bash cygwin libgcc1 libgmp10 libintl8 libmpfr4 libreadline7
Package bash is already installed, skipping
Package cygwin is already installed, skipping
Found package libgcc1
Package libgcc1 is already included, skipping
Found package libgmp10

Installing libgmp10
Downloading libgmp10-6.1.0-3p1.tar.xz...
Unpacking libgmp10-6.1.0-3p1.tar.xz...
Extracting dependencies for usr/bin/cyggmp-10.dll...
Package libgmp10 requires the following packages, installing cygwin
Package cygwin is already installed, skipping
Package libgmp10 installed.
Found package libintl8
Package libintl8 is already included, skipping
Found package libmpfr4

Installing libmpfr4
Downloading libmpfr4-3.1.4-1.tar.xz...
Unpacking libmpfr4-3.1.4-1.tar.xz...
Extracting dependencies for usr/bin/cygmpfr-4.dll...
Package libmpfr4 requires the following packages, installing cygwin libgcc1 libgmp10
Package cygwin is already installed, skipping
Found package libgcc1
Package libgcc1 is already included, skipping
Package libgmp10 is already installed, skipping
Package libmpfr4 installed.
Found package libreadline7
Package libreadline7 is already included, skipping
Package gawk installed.

Rebasing new libraries

Found package rebase

Installing rebase
Downloading rebase-4.4.2-1.tar.xz...
Unpacking rebase-4.4.2-1.tar.xz...
Extracting dependencies for usr/bin/rebase.exe...
Extracting dependencies for usr/bin/peflags.exe...
Package rebase requires the following packages, installing coreutils cygwin grep gzip sed
Found package coreutils
Package coreutils is already included, skipping
Package cygwin is already installed, skipping
Package grep is already installed, skipping
Found package gzip
Package gzip is already included, skipping
Found package sed
Package sed is already included, skipping
Package rebase installed.

Then I change in the scripts to use /bin/gawk.exe 
Now everything seems to work fine.