Processing math: 100%

Thursday, May 6, 2021

Fst as heterozygosities of two populations and their crosses

Among the several definitions and interpretations of F_{ST} , I like the paper of Bhatia et al. 2013 because it summarizes them well. One of them, that I like, is Hudson's definition (and estimation) of F_{ST} , which for two population turns out to be the same as Weir-Hill and Weir-Cockerham. I found useful to "translate" into animal breeding jargon and the variances of two purebreds and F1 and F2 crosses. This has probably been done elsewhere. In fact this is largely based on Bhatia et al.'s explanations and Appendix. I will use q=1-p  often. I assume large populations of the same size.

Hudson's definition

Consider populations 1 and 2. Let 

F_{ST}=1-\frac{H_w}{H_b}=\frac{H_b - H_w}{H_b} 

 where H_w=p_1 (1-p_1)+p_2 (1-p_2)  is heterozygosity "within" and H_b=p_1 (1-p_2)+p_2 (1-p_1)  is heterozygosity "between". What does this mean?

  •   H_w=p_1 (1-p_1)+p_2 (1-p_2)  the heterozygosity "within" can be thought as the average heterozygosity across the two populations: H_w= \frac{H_1 + H_2}{2} = \frac{1}{2}\left(2 p_1 (1-p_1)+2 p_2 (1-p_2)\right)=p_1 (1-p_1)+p_2 (1-p_2) 

  •   H_b=p_1 (1-p_2)+p_2 (1-p_1)  is the heterozygosity of an F1 population with one gamete coming from population 1 and the other from population 2.

The difference H_b - H_w  is the numerator of the F_{ST}  and is:

H_b - H_w= p_1 q_1 + p_2 q_2 - p_1 q_2 + p_2 q_1 = (p_1 - p_2)^2  

where (p_1 - p_2)^2=N  is the numerator of the F_{ST}  and it is Nei's minimal genetic distance.

In an F2 population there is HW equilibrium and the allele frequency is p_{F2}=\frac{p_1 + p_2}{2} . Thus the heterozygosity is  H_{F2}=2 \frac{p_1 + p_2}{2}\frac{q_1 + q_2}{2}=\frac{1}{2}(p_1 + p_2)(q_1 + q_2) .

The increased variance in an F2 population from the average variance across the two populations is  

H_{F2} - H_w=\frac{1}{2}(p_1 + p_2)(q_1 + q_2) - (p_1 q_1 + p_2 q_2) = \frac{1}{2}(p_1 q_2 + p_2 q_1 - p_1 q_1 - p_2 q_2)=\frac{1}{2}(p_1 - p_2)^2  

which is half the difference H_b - H_w , thus H_{F2} - H_w=\frac{1}{2}(H_b - H_w)

The segregation variance is the difference between heterozygosities in the F1 and in the F2. From before H_{F2} =\frac{H_b}{2}+\frac{H_w}{2}  and it is 

H_{F2} - H_b =  \frac{H_b}{2}+\frac{H_w}{2} - H_b = \frac{H_w}{2} - \frac{H_b}{2}=\frac{1}{2}(H_w - H_b)= \frac{1}{2}(p_1 - p_2)^2 = \frac{N}{2}  

Thus, when we move from two populations to an F1 we gain say \Delta H  in genetic variance and we reproduce the F1 to create an F2 we loss (from the F1) \frac{\Delta H}{2}  and we gain (from the purebreds) \frac{\Delta H}{2} . The (numerator of the) F_{ST}  is a measure of this. More exactly, the F_{ST}  explains how much of the variance of the (hypothetical) F1 population is due to mixing populations and not to the variance within populations (this is of course Wright's original interpretation).


Nei's definition

We will call it  F_{ST}^{Nei} . It is defined as

F_{ST}^{Nei}=\frac{(p_1 - p_2)^2}{2\bar{p}(1-\bar{p})}  

where, because \bar{p}=\frac{p_1 + p_2}{2}=p_{F2} ,  the denominator is exactly  the heterozygosity in our F2 population. Thus the F_{ST}^{Nei}  and Hudson's  F_{ST}  are not the same thing,  because the denominator refer to a different "common" population, an F1 population for Hudsons and WC and an F2 for Nei. Bhatia et al. show that on expectation and in the limit

F_{ST}^{Nei} \rightarrow \frac{F_{ST}^1+F_{ST}^2}{2-\frac{F_{ST}^1+F_{ST}^2}{2}}  but in fact, Hudson's F_{ST}=\frac{F_{ST}^1+F_{ST}^2}{2}  which after some manipulations gives

F_{ST}^{Nei}=\frac{F_{ST}}{1-\frac{F_{ST}}{2}} 

so, Nei's F_{ST}^{Nei}  (very slightly for small values) understimates Hudson's (or Weir-Cockerham, Weir-Hill) F_{ST} , but this is normal, strictly speaking they're not the same thing. 

References


Bhatia, G., Patterson, N., Sankararaman, S., & Price, A. L. (2013). Estimating and interpreting FST: the impact of rare variants. Genome research23(9), 1514-1521.
Hudson, R. R., Slatkin, M., & Maddison, W. P. (1992). Estimation of levels of gene flow from DNA sequence data. Genetics132(2), 583-589.
Nei, M. (1987). Molecular evolutionary genetics. Columbia university press.
Weir, B. S., & Cockerham, C. C. (1984). Estimating F-statistics for the analysis of population structure. evolution, 1358-1370.
Weir, B. S., & Hill, W. G. (2002). Estimating F-statistics. Annual review of genetics36(1), 721-750.

No comments:

Post a Comment