artadia: Fst as heterozygosities of two populations and their crosses

Among the several definitions and interpretations of $latex F_{ST}$latex , I like the paper of Bhatia et al. 2013 because it summarizes them well. One of them, that I like, is Hudson's definition (and estimation) of $latex F_{ST}$latex , which for two population turns out to be the same as Weir-Hill and Weir-Cockerham. I found useful to "translate" into animal breeding jargon and the variances of two purebreds and F1 and F2 crosses. This has probably been done elsewhere. In fact this is largely based on Bhatia et al.'s explanations and Appendix. I will use $latex q=1-p$latex often. I assume large populations of the same size.

Hudson's definition

Consider populations 1 and 2. Let

$latex F_{ST}=1-\frac{H_w}{H_b}=\frac{H_b - H_w}{H_b}$latex

where $latex H_w=p_1 (1-p_1)+p_2 (1-p_2)$latex is heterozygosity "within" and $latex H_b=p_1 (1-p_2)+p_2 (1-p_1)$latex is heterozygosity "between". What does this mean?

$latex H_w=p_1 (1-p_1)+p_2 (1-p_2)$latex the heterozygosity "within" can be thought as the average heterozygosity across the two populations: $latex H_w= \frac{H_1 + H_2}{2} = \frac{1}{2}\left(2 p_1 (1-p_1)+2 p_2 (1-p_2)\right)=p_1 (1-p_1)+p_2 (1-p_2)$latex

$latex H_b=p_1 (1-p_2)+p_2 (1-p_1)$latex is the heterozygosity of an F1 population with one gamete coming from population 1 and the other from population 2.

The difference $latex H_b - H_w$latex is the numerator of the $latex F_{ST}$latex and is:

$latex H_b - H_w= p_1 q_1 + p_2 q_2 - p_1 q_2 + p_2 q_1 = (p_1 - p_2)^2 $latex

where $latex (p_1 - p_2)^2=N$latex is the numerator of the $latex F_{ST}$latex and it is Nei's minimal genetic distance.

In an F2 population there is HW equilibrium and the allele frequency is $latex p_{F2}=\frac{p_1 + p_2}{2}$latex . Thus the heterozygosity is $latex H_{F2}=2 \frac{p_1 + p_2}{2}\frac{q_1 + q_2}{2}=\frac{1}{2}(p_1 + p_2)(q_1 + q_2)$latex .

The increased variance in an F2 population from the average variance across the two populations is

$latex H_{F2} - H_w=\frac{1}{2}(p_1 + p_2)(q_1 + q_2) - (p_1 q_1 + p_2 q_2) = \frac{1}{2}(p_1 q_2 + p_2 q_1 - p_1 q_1 - p_2 q_2)=\frac{1}{2}(p_1 - p_2)^2 $latex

which is half the difference $latex H_b - H_w$latex , thus $latex H_{F2} - H_w=\frac{1}{2}(H_b - H_w) $latex .

The segregation variance is the difference between heterozygosities in the F1 and in the F2. From before $latex H_{F2} =\frac{H_b}{2}+\frac{H_w}{2}$latex and it is

$latex H_{F2} - H_b = \frac{H_b}{2}+\frac{H_w}{2} - H_b = \frac{H_w}{2} - \frac{H_b}{2}=\frac{1}{2}(H_w - H_b)= \frac{1}{2}(p_1 - p_2)^2 = \frac{N}{2} $latex

Thus, when we move from two populations to an F1 we gain say $latex \Delta H$latex in genetic variance and we reproduce the F1 to create an F2 we loss (from the F1) $latex \frac{\Delta H}{2}$latex and we gain (from the purebreds) $latex \frac{\Delta H}{2}$latex . The (numerator of the) $latex F_{ST}$latex is a measure of this. More exactly, the $latex F_{ST}$latex explains how much of the variance of the (hypothetical) F1 population is due to mixing populations and not to the variance within populations (this is of course Wright's original interpretation).

Nei's definition

We will call it $latex F_{ST}^{Nei}$latex . It is defined as

$latex F_{ST}^{Nei}=\frac{(p_1 - p_2)^2}{2\bar{p}(1-\bar{p})}$latex

where, because $latex \bar{p}=\frac{p_1 + p_2}{2}=p_{F2}$latex , the denominator is exactly the heterozygosity in our F2 population. Thus the $latex F_{ST}^{Nei}$latex and Hudson's $latex F_{ST}$latex are not the same thing, because the denominator refer to a different "common" population, an F1 population for Hudsons and WC and an F2 for Nei. Bhatia et al. show that on expectation and in the limit

$latex F_{ST}^{Nei} \rightarrow \frac{F_{ST}^1+F_{ST}^2}{2-\frac{F_{ST}^1+F_{ST}^2}{2}}$latex but in fact, Hudson's $latex F_{ST}=\frac{F_{ST}^1+F_{ST}^2}{2}$latex which after some manipulations gives

$latex F_{ST}^{Nei}=\frac{F_{ST}}{1-\frac{F_{ST}}{2}}$latex

so, Nei's $latex F_{ST}^{Nei}$latex (very slightly for small values) understimates Hudson's (or Weir-Cockerham, Weir-Hill) $latex F_{ST}$latex , but this is normal, strictly speaking they're not the same thing.

References

Bhatia, G., Patterson, N., Sankararaman, S., & Price, A. L. (2013). Estimating and interpreting FST: the impact of rare variants. Genome research, 23(9), 1514-1521.

Hudson, R. R., Slatkin, M., & Maddison, W. P. (1992). Estimation of levels of gene flow from DNA sequence data. Genetics, 132(2), 583-589.

Nei, M. (1987). Molecular evolutionary genetics. Columbia university press.

Weir, B. S., & Cockerham, C. C. (1984). Estimating F-statistics for the analysis of population structure. evolution, 1358-1370.

Weir, B. S., & Hill, W. G. (2002). Estimating F-statistics. Annual review of genetics, 36(1), 721-750.

artadia

Thursday, May 6, 2021

Fst as heterozygosities of two populations and their crosses

Hudson's definition

Nei's definition

References

No comments:

Post a Comment