alegarra@genotoul2 ~/save/progs $ cat ex_rachel
 1 a b
 1 b b
 1 c c
 2 b b
 2 c c
 2 d d
 3 a b
 3 b b
 3 c c
e.g. there are three animals (1 to 3) and three markers. Rachel and I would like them to be formatted one animal per line and markers one after each other, in this way:
alegarra@genotoul2 ~/save/progs $ cat out
         1 ab bb cc 
         2 bb cc dd 
         3 ab bb cc 
This is conceptually simple if animals are sorted:
- Read a line
- If the animal is the same as the old one, print the markers after the previous one
- If the animal is different, start a new line, print the animal and the markers.
- Add special cases for the first and last line
Here is an awk implementation
alegarra@genotoul2 ~/save/progs $ cat SNPcol2line.awk 
#! /bin/awk -f
# this program reads genotypes in one line per locus
# then puts them as
# individual allele1 allele2 allele1 allele 2
#
BEGIN{
 idold=0
 i=0
}
{
  id=$1
  # if new animal
  if(id!=idold){
   if(idold!=0){
    # close previous line
    printf("\n")
   }
   # write new id
   printf("%10s%1s",id," ")
   idold=id
  }
  printf("%1s%1s%1s",$2,$3," ")
}
END{
 # last individual
 printf("\n")
}
which works:
alegarra@genotoul2 ~/save/progs $ ./SNPcol2line.awk ex_rachel 
         1 ab bb cc 
         2 bb cc dd 
         3 ab bb cc 
 
No comments:
Post a Comment