The variances and covariances used in twin analyses often are computed using a statistical package such as SPSS [SPSS, 1988] or SAS [SAS, 1988], or by PRELIS []. Nevertheless, it is useful to examine how they are calculated in order to ensure a comprehensive understanding of one's observed data. In this Section we describe the calculation of means, variances, covariances, and correlations.
Some simulated measurements from 16 MZ and 16 DZ twin pairs are presented in
Table 2.1. The observed values in the columns labelled Twin 1
MZ | DZ | ||
Twin 1 | Twin 2 | Twin 1 | Twin 2 |
3 | 2 | 0 | 1 |
3 | 3 | 2 | 3 |
2 | 1 | 1 | 2 |
1 | 2 | 4 | 3 |
0 | 0 | 3 | 1 |
2 | 2 | 2 | 2 |
2 | 2 | 2 | 2 |
3 | 2 | 1 | 3 |
3 | 3 | 3 | 4 |
2 | 3 | 1 | 0 |
1 | 1 | 1 | 1 |
1 | 1 | 2 | 1 |
4 | 4 | 3 | 3 |
2 | 3 | 3 | 2 |
2 | 1 | 2 | 2 |
1 | 2 | 2 | 2 |
In order to obtain the summary statistics of variances and covariances
for genetic analysis, it is first necessary to compute the average
value for a set of measurements, called the mean.
The mean is typically denoted by a bar over the variable name for a
group of observations, for example or
or
. The formula for calculation of the mean
is:
The variance of the observations represents a
measure of dispersion
around the mean; that is, how much, on average, observations differ from the
mean. The variance formula for a sample of measurements, often represented as
or
or
, is
Covariances
are computationally similar to
variances, but represent
mean deviations which are shared by two sets of observations. In the
twin example, covariances are useful because they indicate the extent
to which deviations from the mean by Twin 1 are similar to the second
twin's deviations from the mean. Thus, the covariance between
observations of Twin 1 and Twin 2 represents a scale-dependent measure
of twin similarity. Covariances are often denoted by
or Cov
or Cov
, and are calculated as
For the twin data in Table 2.1, the covariance between MZ twins is
The correlation coefficient
is
closely related to the covariance between two sets of observations.
Correlations may be interpreted in a similar manner as covariances,
but are rescaled to give a lower bound of -1.0 and an upper bound of 1.0.
The correlation coefficient, , may be calculated using the
covariance between two measures and the square root of the variance
(the standard deviation)
of each measure:
Although variances and covariances typically define the observed
information for biometrical analyses of twin data, correlations
are useful for comparing resemblances between twins as a function of
genetic relatedness. In the simulated twin data, the MZ twin
correlation () is greater than that of the DZ twins (
). This greater similarity of MZ twins may be due to several
sources of variation (discussed in subsequent chapters), but at the
least is suggestive of a heritable basis for the trait, as increased
MZ similarity could result from the fact that MZ twins are genetically
identical, whereas DZ twins share only 1/2 of their genes on average.