artadia

Saturday, November 19, 2016

Plotting correlations

I found this nice R package to plot correlations : corrplot . Imagine that you want to present five genetic correlations to your buddies. Here's how to visualize them nicely.

#create a covariance matrix
set.seed(1234)
V=rWishart(1,10,diag(5))[,,1]
cov2cor(V)
[,1] [,2] [,3] [,4] [,5]
[1,] 1.00000000 0.1169636 0.2236465 -0.2667448 0.04351447
[2,] 0.11696358 1.0000000 -0.2260988 -0.3425368 0.64837326
[3,] 0.22364648 -0.2260988 1.0000000 -0.1883763 -0.22484960
[4,] -0.26674477 -0.3425368 -0.1883763 1.0000000 -0.50315007
[5,] 0.04351447 0.6483733 -0.2248496 -0.5031501 1.00000000
#give it names
colnames(V)=c("MY","gain","longevity","SCS","pietin")
rownames(V)=c("MY","gain","longevity","SCS","pietin")

#plot
require(corrplot)
corrplot.mixed(cor(V),col=gray.colors(10))

Thursday, October 27, 2016

French Keyboard on US Macbook Keyboard

I have a US-like keyboard with this key disposition

However, it happens to me to write in French changing keyboard preferences to French. The keys are mapped to a French keyboard. I use a rubber cover from kbcovers.com, which is excellent and looks like this:

However, the physical layout of a true French macbok keyboard is slightly different. Also, and more impostant, some important characters such as | ~ # [ ] { } are not present, or difficult to see, in the cover.

So, I prepared my own cheatsheet:

Yes, it does look horrible and yes, it does work :-)

Tuesday, September 20, 2016

formatting SNPs using R or awk

In some software for genomic prediction (blupf90, GS3 and may be other) the genotypes shoukd be given in a plain text file as follows:

snp_file.txt

          25 1121022100
         600 0111220012
        1333 0110111111
           5 1120112102
          89 0111220001

with no spaces between genotypes, id and genotypes separated by spaces - no tabs - and all genotypes starting at the same column. Sometimes it is not obvious how to get this format. Llibertat Tusell got a solution in R:

snps=sample(c(0,1,2),prob=c(.25,0.5,.25),size=50,replace=T)
X=matrix(ncol=10,nrow=5,snps)
animal=c(25,600,1333,5,89)

con <- file("snp_file.txt", "w")

for (i in 1:5){
tmp=paste( X[i,] , collapse = "" )
cat(format(animal[i],width=12),tmp,'\n',file=con)
}
close(con)

The R program collapses markers into a single string, then puts format to the animal "word" so that it has constant width, then it writes it to a file

Another solution is to use awk and start from a file, e.g.

$ cat exo_geno_spaces

1101 1 0 2 2 1 1 2

101 2 2 1 1 2 1 1

254 1 1 2 0 1 1 0

255 2 1 2 2 0 1 2

Then you can use an awk program ./remove_spaces_snps.awk:

#! /opt/local/bin/gawk -f

# this script removes space between SNP genotypes

# and formatting as UGA

BEGIN{}

{

# print animal

printf( "%20s",$1)

printf( "%1s"," ")

for (i =2; i<=NF; i++){

printf( "%1s",$i)

}

printf("\n")

}

END{}

This awk program prints on stdout the animal (with constant width) then markers without sopace separation, then a newline.

$ ./remove_spaces_snps.awk exo_geno_spaces > out

$ cat out

1101 1022112

101 2211211

254 1120110

255 2122012

Monday, September 19, 2016

Livestock Fair: Pirenaica cattle

I was in a livestock fair in Irurtzun and I could take good pictures of local breeds. These are Pirenaica cows and calves. The Pirenaica breeding association is Conaspi and is has been the object of many scientific publications.

In the same village, there used to be a weekly livestock fair every tuesday until the 70's. In this old picture circa 1950 you can see some local animals (some of them Pirenaica) that were used, among other things, for working the fields.

Thursday, September 15, 2016

Reordering matrix in R

I have this matrix with relationships across metafounders and years. However the matrix is disordered, let's say:

> g
1998 2010 1970
1998 0.90 0.67 0.83
2010 0.67 0.92 0.52
1970 0.83 0.52 0.95

I would like to sort this matrix in ascending order, i.e. the cell corrresponding to [1970,1970] should go on top left.

This can be done using a loop but it is tricky. In R this is easy using sorted indices:

> g1=g[order(rownames(g)),order(colnames(g))]
> g1
1970 1998 2010
1970 0.95 0.83 0.52
1998 0.83 0.90 0.67
2010 0.52 0.67 0.92

Tuesday, September 13, 2016

ifort options -openmp and -qopenmp

I work in three computers, two servers (let's call them grits and cassoulet) and my own Mac machine. Well, I found out that the options to use OpenMP in the Intel Fortran compiler (ifort) are different depending on the compiler version:

grits: ifort 15.0.3: -qopenmp
Mac: ifort 15.0.2: -qopenmp

but...

cassoulet: ifort 14.0.3: -openmp

According to the documentation for for ifort, the option -openmp is deprecated .

Friday, July 29, 2016

gawk in MobaXterm

MobaXterm is a nice unix/linux console emulator. We were trying to use it to run a script for genetic evaluation, yet in one of the awk scripts we used the sentence
BEGIN {FIELDWIDTHS = "12 14 8 2 3 3 8 1 3 14 14 3 8 14" }

it seems that this FIELDWIDTHS is restricted to gawk, but not to all implementations of awk, in particular not to the one in MobaXterm. However MobaXterm has the possibility of installing programs and plugins. Thus I try

[andres.ANDRESLEGAR2422] → apt-get

apt-cyg: Installs and removes Cygwin packages.

  "apt-cyg install <package names>" to install packages

So, I install it

[andres.ANDRESLEGAR2422] → apt-get install gawk

Trying to download file setup.bz2

Updated setup.ini

Found package gawk

Downloading gawk-4.1.3-1.tar.xz...

Unpacking gawk-4.1.3-1.tar.xz...

Extracting dependencies for usr/bin/gawk.exe...

Extracting dependencies for usr/bin/gawk-4.1.3.exe...

Extracting dependencies for usr/libexec/awk/pwcat.exe...

Extracting dependencies for usr/libexec/awk/grcat.exe...

Extracting dependencies for usr/lib/gawk/filefuncs.dll...

Extracting dependencies for usr/lib/gawk/fnmatch.dll...

Extracting dependencies for usr/lib/gawk/fork.dll...

Extracting dependencies for usr/lib/gawk/inplace.dll...

Extracting dependencies for usr/lib/gawk/ordchr.dll...

Extracting dependencies for usr/lib/gawk/readdir.dll...

Extracting dependencies for usr/lib/gawk/readfile.dll...

Extracting dependencies for usr/lib/gawk/revoutput.dll...

Extracting dependencies for usr/lib/gawk/revtwoway.dll...

Extracting dependencies for usr/lib/gawk/rwarray.dll...

Extracting dependencies for usr/lib/gawk/testext.dll...

Extracting dependencies for usr/lib/gawk/time.dll...

Package gawk requires the following packages, installing bash cygwin libgcc1 libgmp10 libintl8 libmpfr4 libreadline7

Package bash is already installed, skipping

Package cygwin is already installed, skipping

Found package libgcc1

Package libgcc1 is already included, skipping

Found package libgmp10

Installing libgmp10

Downloading libgmp10-6.1.0-3p1.tar.xz...

Unpacking libgmp10-6.1.0-3p1.tar.xz...

Extracting dependencies for usr/bin/cyggmp-10.dll...

Package libgmp10 requires the following packages, installing cygwin

Package cygwin is already installed, skipping

Package libgmp10 installed.

Found package libintl8

Package libintl8 is already included, skipping

Found package libmpfr4

Installing libmpfr4

Downloading libmpfr4-3.1.4-1.tar.xz...

Unpacking libmpfr4-3.1.4-1.tar.xz...

Extracting dependencies for usr/bin/cygmpfr-4.dll...

Package libmpfr4 requires the following packages, installing cygwin libgcc1 libgmp10

Package cygwin is already installed, skipping

Found package libgcc1

Package libgcc1 is already included, skipping

Package libgmp10 is already installed, skipping

Package libmpfr4 installed.

Found package libreadline7

Package libreadline7 is already included, skipping

Package gawk installed.

Rebasing new libraries

Found package rebase

Installing rebase

Downloading rebase-4.4.2-1.tar.xz...

Unpacking rebase-4.4.2-1.tar.xz...

Extracting dependencies for usr/bin/rebase.exe...

Extracting dependencies for usr/bin/peflags.exe...

Package rebase requires the following packages, installing coreutils cygwin grep gzip sed

Found package coreutils

Package coreutils is already included, skipping

Package cygwin is already installed, skipping

Package grep is already installed, skipping

Found package gzip

Package gzip is already included, skipping

Found package sed

Package sed is already included, skipping

Package rebase installed.

Then I change in the scripts to use /bin/gawk.exe

Now everything seems to work fine.