artadia: format genotypes for blupf90, GS3

Blupf90 and GS3 require genotypes to be in this form:

       345 1111212111212112

       346 1121111211211021

       347 2022222220202022

       348 1111111211211021

      1349 2022222220202022

     12350 1111212111212112

       351 1121111211211021

       352 1121111211211021

       353 2022222220202022

this also works

     345   1111212111212112

       346 1121111211211021

      347  2022222220202022

       348 1111111211211021

      1349 2022222220202022

     12350 1111212111212112

       349 2022222220202022

       350 1111212111212112

       351 1121111211211021

       352 1121111211211021

       353 2022222220202022

or this

345   1111212111212112

346   1121111211211021

347   2022222220202022

348   1111111211211021

1349  2022222220202022

12350 1111212111212112

this will be read erroneously and it will give wrong results:

345 1111212111212112

346 1121111211211021

347 2022222220202022

348 1111111211211021

1349 2022222220202022

12350 1111212111212112

Id's and genotypes (coded as 0/1/2) need to be separated by 1 or several spaces (not tabs) and genotypes need to start at exactly the same column.

A simple fix is to use awk.
Imagine that your genotype file is gene.txt :

$ cat gene.txt

345 1111212111212112

346 1121111211211021

347 2022222220202022

348 1111111211211021

1349 2022222220202022

12350 1111212111212112

then you can do

awk 'printf("%10s%1s%" length($2) "s\n",$1," ",$2) gene.txt >gene2.txt

On gene2.txt, things are formatted:

awk '{printf("%10s%1s%" length($2) "s\n",$1," ",$2)}' gene.txt >gene2.txt

cat gene2.txt 

       345 1111212111212112

       346 1121111211211021

       347 2022222220202022

       348 1111111211211021

      1349 2022222220202022

     12350 1111212111212112

artadia