alegarra@genotoul2 ~/save/progs $ cat ex_rachel
1 a b
1 b b
1 c c
2 b b
2 c c
2 d d
3 a b
3 b b
3 c c
e.g. there are three animals (1 to 3) and three markers. Rachel and I would like them to be formatted one animal per line and markers one after each other, in this way:
alegarra@genotoul2 ~/save/progs $ cat out
1 ab bb cc
2 bb cc dd
3 ab bb cc
This is conceptually simple if animals are sorted:
- Read a line
- If the animal is the same as the old one, print the markers after the previous one
- If the animal is different, start a new line, print the animal and the markers.
- Add special cases for the first and last line
Here is an awk implementation
alegarra@genotoul2 ~/save/progs $ cat SNPcol2line.awk
#! /bin/awk -f
# this program reads genotypes in one line per locus
# then puts them as
# individual allele1 allele2 allele1 allele 2
#
BEGIN{
idold=0
i=0
}
{
id=$1
# if new animal
if(id!=idold){
if(idold!=0){
# close previous line
printf("\n")
}
# write new id
printf("%10s%1s",id," ")
idold=id
}
printf("%1s%1s%1s",$2,$3," ")
}
END{
# last individual
printf("\n")
}
which works:
alegarra@genotoul2 ~/save/progs $ ./SNPcol2line.awk ex_rachel
1 ab bb cc
2 bb cc dd
3 ab bb cc