Thursday, May 6, 2021

Fst as heterozygosities of two populations and their crosses

Among the several definitions and interpretations of $latex F_{ST}$latex , I like the paper of Bhatia et al. 2013 because it summarizes them well. One of them, that I like, is Hudson's definition (and estimation) of $latex F_{ST}$latex , which for two population turns out to be the same as Weir-Hill and Weir-Cockerham. I found useful to "translate" into animal breeding jargon and the variances of two purebreds and F1 and F2 crosses. This has probably been done elsewhere. In fact this is largely based on Bhatia et al.'s explanations and Appendix. I will use $latex q=1-p$latex  often. I assume large populations of the same size.

Hudson's definition

Consider populations 1 and 2. Let 

$latex F_{ST}=1-\frac{H_w}{H_b}=\frac{H_b - H_w}{H_b}$latex 

 where $latex H_w=p_1 (1-p_1)+p_2 (1-p_2)$latex  is heterozygosity "within" and $latex H_b=p_1 (1-p_2)+p_2 (1-p_1)$latex  is heterozygosity "between". What does this mean?

  •  $latex H_w=p_1 (1-p_1)+p_2 (1-p_2)$latex  the heterozygosity "within" can be thought as the average heterozygosity across the two populations: $latex H_w= \frac{H_1 + H_2}{2} = \frac{1}{2}\left(2 p_1 (1-p_1)+2 p_2 (1-p_2)\right)=p_1 (1-p_1)+p_2 (1-p_2)$latex 

  •  $latex H_b=p_1 (1-p_2)+p_2 (1-p_1)$latex  is the heterozygosity of an F1 population with one gamete coming from population 1 and the other from population 2.

The difference $latex H_b - H_w$latex  is the numerator of the $latex F_{ST}$latex  and is:

$latex H_b - H_w= p_1 q_1 + p_2 q_2 - p_1 q_2 + p_2 q_1 = (p_1 - p_2)^2 $latex 

where $latex (p_1 - p_2)^2=N$latex  is the numerator of the $latex F_{ST}$latex  and it is Nei's minimal genetic distance.

In an F2 population there is HW equilibrium and the allele frequency is $latex p_{F2}=\frac{p_1 + p_2}{2}$latex . Thus the heterozygosity is  $latex H_{F2}=2 \frac{p_1 + p_2}{2}\frac{q_1 + q_2}{2}=\frac{1}{2}(p_1 + p_2)(q_1 + q_2)$latex .

The increased variance in an F2 population from the average variance across the two populations is  

$latex H_{F2} - H_w=\frac{1}{2}(p_1 + p_2)(q_1 + q_2) - (p_1 q_1 + p_2 q_2) = \frac{1}{2}(p_1 q_2 + p_2 q_1 - p_1 q_1 - p_2 q_2)=\frac{1}{2}(p_1 - p_2)^2 $latex 

which is half the difference $latex H_b - H_w$latex , thus $latex H_{F2} - H_w=\frac{1}{2}(H_b - H_w) $latex . 

The segregation variance is the difference between heterozygosities in the F1 and in the F2. From before $latex H_{F2} =\frac{H_b}{2}+\frac{H_w}{2}$latex  and it is 

$latex H_{F2} - H_b =  \frac{H_b}{2}+\frac{H_w}{2} - H_b = \frac{H_w}{2} - \frac{H_b}{2}=\frac{1}{2}(H_w - H_b)= \frac{1}{2}(p_1 - p_2)^2 = \frac{N}{2} $latex 

Thus, when we move from two populations to an F1 we gain say $latex \Delta H$latex  in genetic variance and we reproduce the F1 to create an F2 we loss (from the F1) $latex \frac{\Delta H}{2}$latex  and we gain (from the purebreds) $latex \frac{\Delta H}{2}$latex . The (numerator of the) $latex F_{ST}$latex  is a measure of this. More exactly, the $latex F_{ST}$latex  explains how much of the variance of the (hypothetical) F1 population is due to mixing populations and not to the variance within populations (this is of course Wright's original interpretation).


Nei's definition

We will call it  $latex F_{ST}^{Nei}$latex . It is defined as

$latex F_{ST}^{Nei}=\frac{(p_1 - p_2)^2}{2\bar{p}(1-\bar{p})}$latex  

where, because $latex \bar{p}=\frac{p_1 + p_2}{2}=p_{F2}$latex ,  the denominator is exactly  the heterozygosity in our F2 population. Thus the $latex F_{ST}^{Nei}$latex  and Hudson's  $latex F_{ST}$latex  are not the same thing,  because the denominator refer to a different "common" population, an F1 population for Hudsons and WC and an F2 for Nei. Bhatia et al. show that on expectation and in the limit

$latex F_{ST}^{Nei} \rightarrow \frac{F_{ST}^1+F_{ST}^2}{2-\frac{F_{ST}^1+F_{ST}^2}{2}}$latex  but in fact, Hudson's $latex F_{ST}=\frac{F_{ST}^1+F_{ST}^2}{2}$latex  which after some manipulations gives

$latex F_{ST}^{Nei}=\frac{F_{ST}}{1-\frac{F_{ST}}{2}}$latex 

so, Nei's $latex F_{ST}^{Nei}$latex  (very slightly for small values) understimates Hudson's (or Weir-Cockerham, Weir-Hill) $latex F_{ST}$latex , but this is normal, strictly speaking they're not the same thing. 

References


Bhatia, G., Patterson, N., Sankararaman, S., & Price, A. L. (2013). Estimating and interpreting FST: the impact of rare variants. Genome research23(9), 1514-1521.
Hudson, R. R., Slatkin, M., & Maddison, W. P. (1992). Estimation of levels of gene flow from DNA sequence data. Genetics132(2), 583-589.
Nei, M. (1987). Molecular evolutionary genetics. Columbia university press.
Weir, B. S., & Cockerham, C. C. (1984). Estimating F-statistics for the analysis of population structure. evolution, 1358-1370.
Weir, B. S., & Hill, W. G. (2002). Estimating F-statistics. Annual review of genetics36(1), 721-750.