It often happens that data are stored at the individual subject level
rather than at the family level. Typically, each subject has a family
number and an `id' number to mark their position in the family (first
or second twin). A necessary step to analyse the covariance between
relatives is to `glue' the data from family members together so that
the family becomes the unit of measurement and covariances between
family members may be computed. In SAS this is a relatively simple
operation although care must be taken to supply labels for the
variables that do not exceed the SAS maximum length of eight
characters. The SAS script in Appendix shows the
case for twin data, and goes beyond the initial requirement by taking
the sex of the twins into account. Five groups are created, being MZ
male, DZ male, MZ female, DZ female and opposite DZ. The covariances
are computed and output to .dat files which contain the number of
observations (
Nobservations
), the number of input variables
(NInput
), labels, and the covariance matrices (CMatrix).
These .dat files may be used directly in Mx in a diagram, or in a
script using the Include
statement.
Note that the assignment of the twins as 1 or 2 is usually arbitrary for the same sex groups, but in the opposite sex group the male twin is always first. Strictly speaking, when there is no inherent order to the observations the variance covariance matrix is not the best summary statistic to use. The intraclass correlation is the most appropriate summary for observations that do not have any order; it uses a joint estimate of the variance of twin 1 and twin 2, and partitions this into within pairs and between pairs components. However, the intraclass correlation is more difficult to generalize to the multivariate and multiple classes of relatives situations so we stay with covariance matrices here. Sometimes data on birth order or some other characteristic may be used to distinguish more formally between twin 1 and twin 2 within a pair, thereby giving some rationality to the ordering and use of covariance matrices. Should such an approach be taken, it is necessary to split the DZ opposite sex twin group into two groups according to whether the first twin is female or male.
Appendix shows a SAS macro for creating an Mx .dat file,
which fully describes the data: the variable labels, the sample size,
the means and covariances. Comments, beginning with ! indicate the
date the file was created. The resulting .dat file might look like this:
! ! Mx dat file created by SAS on 03FEB1998 ! Data NInput=4 NObservations=844 CMatrix full 1.0086 -0.0148 -0.0317 -0.0443 -0.0148 1.0169 -0.0062 0.0068 -0.0317 -0.0062 0.9342 0.0596 -0.0443 0.0068 0.0596 0.9697 Means 0.0139 -0.0729 0.0722 0.0159 Labels T1F1 T1F2 T2F1 T2F2As will be seen in later chapters, this file is ready for immediate use for drawing path diagrams in the Mx GUI or in an Mx script with the
#include
command.