Tuesday, September 20, 2016

formatting SNPs using R or awk

In some software for genomic prediction (blupf90, GS3 and may be other) the genotypes shoukd be given in a plain text file as follows:

snp_file.txt

          25 1121022100
         600 0111220012
        1333 0110111111
           5 1120112102
          89 0111220001 


with no spaces between genotypes, id and genotypes separated by spaces - no tabs - and all genotypes starting at the same column. Sometimes it is not obvious how to get this format. Llibertat Tusell got a solution in R:

snps=sample(c(0,1,2),prob=c(.25,0.5,.25),size=50,replace=T)
X=matrix(ncol=10,nrow=5,snps)
animal=c(25,600,1333,5,89)

con <- file("snp_file.txt", "w")

for (i in 1:5){
  tmp=paste( X[i,] , collapse = "" )
  cat(format(animal[i],width=12),tmp,'\n',file=con)
}
close(con)


The R program collapses markers into a single string, then puts format to the animal "word" so that it has constant width, then it writes it to a file

Another solution is to use awk and start from a file, e.g.

$ cat exo_geno_spaces 
  1101 1 0 2 2 1 1 2
     101 2 2 1 1 2 1 1
   254 1 1 2 0 1 1 0

   255    2 1 2 2 0 1 2

Then you can use an awk program ./remove_spaces_snps.awk:

#! /opt/local/bin/gawk -f
# this script removes space between SNP genotypes
# and formatting as UGA
BEGIN{}
{
    # print animal
    printf( "%20s",$1)
    printf( "%1s"," ")

    for (i =2; i<=NF; i++){
      printf( "%1s",$i)
    }
    printf("\n")
 }
END{}

This awk program prints on stdout the animal (with constant width) then markers without sopace separation, then a newline.

$ ./remove_spaces_snps.awk exo_geno_spaces > out
$ cat out 

                1101 1022112
                 101 2211211
                 254 1120110

                 255 2122012

Monday, September 19, 2016

Livestock Fair: Pirenaica cattle

I was in a livestock fair in Irurtzun and I could take good pictures of local breeds. These are Pirenaica cows and calves. The Pirenaica breeding association is Conaspi and is has been the object of many scientific publications.





In the same village, there used to be a weekly livestock fair every tuesday until the 70's. In this old picture circa 1950 you can see some local animals (some of them Pirenaica) that were used, among other things, for working the fields. 






Thursday, September 15, 2016

Reordering matrix in R

I have this matrix with relationships across metafounders and years. However the matrix is disordered, let's say:

> g
     1998 2010 1970
1998 0.90 0.67 0.83
2010 0.67 0.92 0.52
1970 0.83 0.52 0.95

I would like to sort this matrix in ascending order, i.e. the cell corrresponding to [1970,1970] should go on top left.

This can be done using a loop but it is tricky. In R this is easy using sorted indices:

> g1=g[order(rownames(g)),order(colnames(g))]
> g1
     1970 1998 2010
1970 0.95 0.83 0.52
1998 0.83 0.90 0.67
2010 0.52 0.67 0.92



Tuesday, September 13, 2016

ifort options -openmp and -qopenmp

I work in three computers, two servers (let's call them grits and cassoulet) and my own Mac machine. Well, I found out that the options to use OpenMP in the Intel Fortran compiler (ifort) are different depending on the compiler version:

  • grits: ifort 15.0.3: -qopenmp
  • Mac: ifort 15.0.2: -qopenmp
but...

  • cassoulet: ifort 14.0.3: -openmp
According to the documentation for for ifort, the option -openmp is deprecated .