To convert this (one animal per line, UGA format as explained here and here)
1 012202001001000020202000
2 012202001001000021211110
3 012202001001000020202000
4 012211001100010020202000
5 111211101100010110111001
6 002202000000000021211110
7 012211001100010021211110
8 022220002200020020202000
9 012202001001000020202000
10 111211101100010110111001
into as many individual SNP files as loci,e.g.
zcat singlemarker/x000001.gz | head
1 1
2 1
3 1
4 1
5 2
6 1
7 1
8 1
9 1
10 2
zcat singlemarker/x000002.gz | head
1 2
2 2
3 2
4 2
5 2
6 1
7 2
8 3
9 2
10 2
How to do it efficiently?
1-transpose the file (in my case via Fortran program) to:
0000100001
1111101211
2222122221
2222222222
0001101201
e.g. the first line is the first marker, and so on
2-use the extraordinary GNU split to split into files, one line (=one marker) at a time:
split -l 1 -d BB.700K.gen_transposed -a 6 --numeric-suffixes=1 --filter='gzip >$FILE.gz'
This command splits one line at a time (-l 1) creates a series of files with numeric suffixes starting in 1 (--numeric-suffixes=1) of width 6 (-a 6) like x000002.gz and finnaly "piped" through gzip to obtain compressed files (--filter='gzip >$FILE.gz')
Still the files look
0000100001
1111101211
2222122221