artadia: formatting SNPs using R or awk

Tuesday, September 20, 2016

formatting SNPs using R or awk

In some software for genomic prediction (blupf90, GS3 and may be other) the genotypes shoukd be given in a plain text file as follows:

snp_file.txt

          25 1121022100
         600 0111220012
        1333 0110111111
           5 1120112102
          89 0111220001

with no spaces between genotypes, id and genotypes separated by spaces - no tabs - and all genotypes starting at the same column. Sometimes it is not obvious how to get this format. Llibertat Tusell got a solution in R:

snps=sample(c(0,1,2),prob=c(.25,0.5,.25),size=50,replace=T)
X=matrix(ncol=10,nrow=5,snps)
animal=c(25,600,1333,5,89)

con <- file("snp_file.txt", "w")

for (i in 1:5){
tmp=paste( X[i,] , collapse = "" )
cat(format(animal[i],width=12),tmp,'\n',file=con)
}
close(con)

The R program collapses markers into a single string, then puts format to the animal "word" so that it has constant width, then it writes it to a file

Another solution is to use awk and start from a file, e.g.

$ cat exo_geno_spaces

1101 1 0 2 2 1 1 2

101 2 2 1 1 2 1 1

254 1 1 2 0 1 1 0

255 2 1 2 2 0 1 2

Then you can use an awk program ./remove_spaces_snps.awk:

#! /opt/local/bin/gawk -f

# this script removes space between SNP genotypes

# and formatting as UGA

BEGIN{}

{

# print animal

printf( "%20s",$1)

printf( "%1s"," ")

for (i =2; i<=NF; i++){

printf( "%1s",$i)

}

printf("\n")

}

END{}

This awk program prints on stdout the animal (with constant width) then markers without sopace separation, then a newline.

$ ./remove_spaces_snps.awk exo_geno_spaces > out

$ cat out

1101 1022112

101 2211211

254 1120110

255 2122012

1 comment:

ManuelOctober 7, 2016 at 12:55 PM

Here another solution using shell one-liners

# remove leading whitespaces
sed -e 's/^[[:space:]]*//' exo_geno_spaces > tmp

# collapse genotypes
cut -d' ' -f2- tmp | tr -d ' ' | \
paste <(cut -d' ' -f1 tmp) - | \
awk '{ printf "%-20s %s\n", $1, $2 }' > out
rm tmp
ReplyDelete
Replies

Add comment