Sample ID Sample Name SNP Name Allele1 - Top Allele2 - Top GC Score
ES140000270478 PLACA_CIC_12_96 250506CS3900140500001_312.1 A G 0.7341
ES140000270478 PLACA_CIC_12_96 250506CS3900065000002_1238.1 G G 0.8932
Some times we have both the nucleotide and the A/B (this example is bovine):
SNP Name Sample ID Allele1 - Forward Allele2 - Forward Allele1 - Top Allele2 - Top Allele1 - AB Allele2 - AB GC Score X Y
ARS-BFGL-BAC-10172 USA201811 G G G G B B 0.9506 0.012 1.036
ARS-BFGL-BAC-1020 USA201811 G G G G B B 0.9673 0.005 0.652
We are then now in the inverse position of having to convert back from A/C/G/T to A/B.
This depends on whether the strand read was a TOP strand or a BOT (bottom) strand, which depends on the particular locus. The rules depend on the possible genotypes at the locus.
For some of them it does not matter if the strand is TOP or BOT:
A/G -> A/B
A/C -> A/B
T/G -> A/B
T/C -> A/B
For other locus it does depend:
- For loci in TOP strands
A/T -> A/B
G/C -> B/A
- For loci in BOT strands
A/T -> B/A
G/C -> A/B
How to find if the locus is TOP or BOT? I am so inept in molecular genetics that I didn't know how to find this information (it must be in some database somewhere) but I found it in one of the files that comes "raw" genotypes and that has "Locus Summary" and looks like:
Locus Summary on ...
Row,Locus_Name,Illumicode_Name,...,Plus/Minus Strand
1,250506CS3900065000002_1238.1,49668394,...,TOP
2,250506CS3900140500001_312.1,29623404,...,TOP