2 Unequal Gene Frequencies

Next: 4 Summary Up: 3 Derivation of Expected Previous: 1 Equal Gene Frequencies Index

2 Unequal Gene Frequencies

The simple results for equal gene frequencies described in the previous section were appreciated by a number of biometricians shortly after the rediscovery of Mendel's work (Castle, 1903; Pearson, 1904; Yule, 1902). However, it was not until Fisher's remarkable 1918 paper that the full generality of the biometrical model was elucidated. Gene frequencies do not have to be equal, nor do they have to be the same for the various polygenic loci involved in the phenotype for the simple fractions,

, $\frac{1}{2}, \frac{1}{4}$ , and

to hold, providing we define

and

appropriately. The algebra is considerably more complicated with unequal gene frequencies and it is necessary to define carefully what we mean by

and

. However, the end result is extremely simple, which is perhaps somewhat surprising. We give the flavor of the approach in this section, and refer the interested reader to the classic texts in this field for further information (Crow and Kimura, 1970; Falconer, 1990; Kempthorne, 1960; Mather and Jinks, 1982). We note that the elaboration of this biometrical model and its power and elegance has been largely responsible for the tremendous strides in inexpensive plant and animal food production throughout the world, placing these activities on a firm scientific basis. Consider the three genotypes, AA, Aa, and aa, with genotypic frequencies

Genotypes
Frequency

The proportion of alleles, or gene frequency, is given by

$\displaystyle \mbox{gene frequency ({\em A})}$	$\textstyle =$	$\displaystyle P + \frac{Q}{2} = u$
$\displaystyle \mbox{({\em a})}$	$\textstyle =$	$\displaystyle R + \frac{Q}{2} = v\; .$	(14)

These expressions derive from the simple fact that the AA genotype contributes only A alleles and the heterozygote, Aa, contributes $\frac{1}{2}$ A and $\frac{1}{2}$ a alleles. A Punnett square showing the allelic form of gametes uniting at random gives the genotypic frequencies in terms of the gene frequencies :

		Male Gametes
		A	a
Female Gametes	A
	a

which yields an alternative representation of the genotypic frequencies

Genotypes
Frequency

That these genotypic frequencies are in Hardy-Weinberg equilibrium may be shown by using them to calculate gene frequencies in the new generation, showing them to be the same, and then reapplying the Punnett square. Using expression 3.8, substituting

, and

, for

, and

, and noting that the sum of gene frequencies is 1 (

), we can see that the new gene frequencies are the same as the old, and that genotypic frequencies will not change in subsequent generations

$\displaystyle u_1$	$\textstyle =$	$\displaystyle u^2 + \frac{1}{2}2uv = u^2 + uv = u(u+v) = u$
$\displaystyle v_1$	$\textstyle =$	$\displaystyle v^2 + \frac{1}{2}2uv = v^2 + uv = v(u+v) =v \; .$	(15)

The biometrical model is developed in terms of these equilibrium frequencies and genotypic effects as

$\begin{displaymath}\begin{array}{lccc} \mbox{Genotypes} & AA & Aa & aa \\ \mb... ...v & v^2 \\ \mbox{Genotypic effect} & d & h & -d \end{array} \end{displaymath}$

(16)

The mean and variance of a population with this composition is obtained in analogous manner to that in 3.1. The mean is

$\displaystyle \mu$	$\textstyle =$	$\displaystyle u^2d + 2uvh - v^2d$
	$\textstyle =$	$\displaystyle (u-v)d + 2uvh$	(17)

Because the mean is a reasonably complex expression, it is not convenient to sum weighted deviations to express the variance as in 3.2, instead, we rearrange the variance formula

$\displaystyle \sigma^2$	$\textstyle =$	$\displaystyle \sum f_i (x_i - \mu)^2$
	$\textstyle =$	$\displaystyle \sum f_i (x_{i}^{2} - 2x_i\mu + \mu^2)$
	$\textstyle =$	$\displaystyle \sum f_i x_{i}^{2} - 2\mu \sum f_i x_i + \mu^2$
	$\textstyle =$	$\displaystyle \sum f_i x_{i}^{2} - 2\mu^2 + \mu^2$
	$\textstyle =$	$\displaystyle \sum f_i x_{i}^{2} - \mu^2$	(18)

Applying this formula to the genotypic effects and their frequencies given in 3.10 above, we obtain

$\displaystyle \sigma^2$	$\textstyle =$	$\displaystyle u^2d^2 + 2uvh^2 + v^2 d^2 - [(u-v)d + 2uvh]^2$
	$\textstyle =$	$\displaystyle u^2d^2 + 2uvh^2 + v^2 d^2 - [(u-v)^2d^2 + 4uvdh(u-v)+4u^2v^2h^2]$
	$\textstyle =$	$\displaystyle u^2d^2 + 2uvh^2 + v^2 d^2 - [(u^2-2uv-v^2)d^2 + 4uvdh(u-v)+4u^2v^2h^2]$
	$\textstyle =$	$\displaystyle 2uv [d^2 + 2(v-u)dh + (1-2uv)h^2]$
	$\textstyle =$	$\displaystyle 2uv [d^2 + 2(v-u)dh + (v-u)h^2 + 2uvh^2]$
	$\textstyle =$	$\displaystyle 2uv [d + (v-u)h]^2 + 4u^2 v^2 h^2\; .$	(19)

When the variance is arranged in this form, the first term (

) defines the additive genetic variance,

, and the second term (

) the dominance variance,

. Why this particular arrangement is used to define

and

rather than some other may be seen if we introduce the notion of gene dose and the regression of genotypic effects on this variable, which essentially is how Fisher proceeded to develop the concepts of

and

. If A is the increasing allele, then we can consider the three genotypes, AA, Aa, aa, as containing

, and

doses of the A allele, respectively. The regression of genotypic effects on these gene doses is shown in Figure 3.2.

**Figure 3.2:** Regression of genotypic effects on gene dosage showing additive and dominance effects under random mating. The figure is drawn to scale for $u=v=\frac{1}{2}$ , , and $h = \frac{1}{2}$ .
$\begin{figure} \setlength{\unitlength}{1.2mm} \begin{center} \begin{picture}(... ... \put (74.5,85.25){\line(0,1){5.25}} \end{picture} \end{center} \end{figure}$

The values that enter into the calculation of the slope of this line are

Genotype	AA	Aa	aa
Genotypic effect ()
Frequency ()
Dose ()	2	1	0

From these values the slope of the regression line of $y$

in Figure 3.2 is given by $\beta_{y,x} = \sigma_{x,y}/\sigma^2_{x}$ . In order to calculate $\sigma^{2}_{x}$ we need $\mu_x$ , which is

$\displaystyle \mu_x$	$\textstyle =$	$\displaystyle 2u^2 + 2uv$
	$\textstyle =$	$\displaystyle 2u(u + v)$
	$\textstyle =$	$\displaystyle 2u\; .$	(20)

Then, $\sigma^{2}_{x}$ is

$\begin{eqnarray*} \sigma^{2}_{x}& = & 2^2u^2 + 1^2 2uv - 2^2u^2 \\ & = & 4u^2 + 2uv - 4u^2 \\ & = & 2uv \;\end{eqnarray*}$

using the variance formula in 3.12. In order to calculate $\sigma_{x,y}$ we need to employ the covariance formula

$\begin{displaymath} \sigma_{x,y} = \sum f_i x_i y_i - \mu_x \mu_y \; , \end{displaymath}$

(21)

where $\mu_y$ and $\mu_x$ are defined as in 3.11 and 3.14, respectively. Then,

$\displaystyle \sigma_{xy}$	$\textstyle =$	$\displaystyle 2u^2d + 2uvh - 2u[(u-v)d + 2uvh]$
	$\textstyle =$	$\displaystyle 2u^2d + 2uvh - 2u^2d + 2uvd - 4u^2vh$
	$\textstyle =$	$\displaystyle 2uvd + h(2uv-4u^2v)$
	$\textstyle =$	$\displaystyle 2uvd + 2uvh(1-2u)$
	$\textstyle =$	$\displaystyle 2uvd + 2uvh(1-u-u)$
	$\textstyle =$	$\displaystyle 2uvd + 2uvh(v-u)$
	$\textstyle =$	$\displaystyle 2uv[d+(v-u)h]\; .$	(22)

Therefore, the slope is

$\displaystyle \beta_{y,x}$	$\textstyle =$	$\displaystyle \frac{\sigma_{xy}}{\sigma_x^2}$
	$\textstyle =$	$\displaystyle 2uv[d + (v - u)h]/2uv$
	$\textstyle =$	$\displaystyle d+(v-u)h \; .$	(23)

Following standard procedures in regression analysis, we can partition $\sigma^{2}_{y}$ into the variance due to the regression and the variance due to residual. The former is equivalent to the variance of the expected $y$

; that is, the variance of the hypothetical points on the line in Figure 3.2, and the latter is the variance of the difference between observed $y$

and the expected values. The variance due to regression is

$\displaystyle \beta\sigma_{xy}$	$\textstyle =$	$\displaystyle 2uv[d+(v-u)h][d+(v-u)h]$
	$\textstyle =$	$\displaystyle 2uv[d+(v-u)h]^2$
	$\textstyle =$	$\displaystyle V_A$	(24)

and we may obtain the residual variance simply by subtracting the variance due to regression from the total variance of $y$

. The variance of genotypic effects ( $\sigma^{2}_{y}$ ) was given in 3.13, and when we subtract the expression obtained for the variance due to regression 3.18, we obtain the residual variances:

$\displaystyle \sigma^{2}_{y} - \beta\sigma_{x,y}$	$\textstyle =$	$\displaystyle 4u^2v^2h^2$
	$\textstyle =$	$\displaystyle V_D \; .$	(25)

In this representation, genotypic effects are defined in terms of the regression line and are known as genotypic values. They are related to

and

, the genotypic effects we defined in Figure 3.1, but now reflect the population mean and gene frequencies of our random mating population. Defined in this way, the genotypic value (

) is

, the additive (

) and dominance (

) deviations of the individual.

		frequency
$G_{AA}$	=
$G_{Aa}$	=
$G_{aa}$	=

In the case of $u=v=\frac{1}{2}$ , this table becomes

			frequency
$G_{AA}$	=	$\frac{1}{2}h$	$\frac{1}{4}$
$G_{Aa}$	=	$\frac{1}{2}h$	$\frac{1}{2}$
$G_{aa}$	=	$\frac{1}{2}h$	$\frac{1}{4}$

from which it can be seen that the weighted sum of all

's is zero ( $\sum f_i G_i = 0$ ). In this case the additive effect is the same as the genotypic effect as originally scaled, and the dominance effect is measured around a mean of $\frac{1}{2}h$ . This representation of genotypic value accurately conveys the extreme nature of unusual genotypes. Let

, an example of complete dominance. In that case, $G_{AA} = G_{Aa} = \frac{1}{2}$ and $G_{aa} = -1\frac{1}{2}$ on our scale. Thus, aa genotypes, which form only $\frac{1}{4}$ of the population, fall far below the mean of

, while the remaining $\frac{3}{4}$ of the population genotypes fall only slightly above the mean of

. Thus, the bulk of the population appears relatively normal, whereas aa genotypes appear abnormal or unusual. When dominance is absent (

), Aa genotypes, which form $\frac{1}{2}$ of the population, have a mean of

and the less frequent genotypes AA and aa appear deviant. This situation is accentuated as the gene frequencies depart from $\frac{1}{2}$ . For example, with $u = \frac{3}{4}$ , $v = \frac{1}{4}$ , and

, then AA and Aa combined form $\frac{15}{16}$ of the population with a genotypic value of $\frac{1}{8}$ , just slightly above the mean of

, whereas the aa genotype has a value of $-1\frac{7}{8}$ . In the limiting case of a very rare allele, AA and Aa tend to

, the population mean, while only aa genotypes take an extreme value. These values intuitively correspond to our notion of a rare disorder of extreme effect, such as untreated phenylketonuria (PKU). The genotypic values

and

that we employ in the Mx model have precisely the expectations given above in 3.18 and 3.19, but are summed over all polygenic loci contributing to the trait. Thus, the biometrical model gives a precise definition to the latent variables employed in Mx for the analysis of twin data.

Next: 4 Summary Up: 3 Derivation of Expected Previous: 1 Equal Gene Frequencies Index

Jeff Lessem 2002-03-21