QmSim2uga.awk changes formats from output of QMSim to input of blupf90. Here is one such loop using substr() , which selects a substring from a string
for (i=1; i<=nsnp; i++){
out=substr($2,i,1)
# QmSim -> UGA
# 0 -> AA -> 0
# 2 -> aa -> 2
# 3 -> Aa -> 1
# 4 -> aA -> 1
# (missing does not exist) -> 5
if(out>2) {out=1}
genotype=genotype out
}
For 9 individuals, in my mac
time ./QmSim2uga.awk p1_mrk_001.txt >/dev/null
nsnp, nanim 78519 9
user 1m47.467s
this takes 2 min .
Alternatively, we split() the string into an array, then we loop through the array:
split($2,geno,"")
for (i=1; i<=nsnp; i++){
out=geno[i]
#substr($2,i,1)
# QmSim -> UGA
# 0 -> AA -> 0
# 2 -> aa -> 2
# 3 -> Aa -> 1
# 4 -> aA -> 1
# (missing does not exist) -> 5
if(out>2) {out=1}
genotype=genotype out
}
This form is waaaay faster:
user 0m0.285s
And with all individuals:
nsnp, nanim 78519 6037
real 3m16.279s
user 3m13.201s
No comments:
Post a Comment