1 SAS scripts to compute covariance matrices

Next: 3 Using PRELIS to Up: 2 Using SAS or Previous: 2 Using SAS or Index

1 SAS scripts to compute covariance matrices

This is not the place to describe in detail the workings of SAS; the thousands of pages in the manuals are quite adequate! All we aim to do here is to get the data in and get the covariance matrix and means out. SAS has a useful procedure, PROC CORR, which will print the required statistics, which can be cut and pasted into a file for Mx use. However, as is commonly the case with computer tasks, investing a little extra initial work on automation will save labor in the long run, and will be more error-proof. It often happens that data are stored at the individual subject level rather than at the family level. Typically, each subject has a family number and an `id' number to mark their position in the family (first or second twin). A necessary step to analyse the covariance between relatives is to `glue' the data from family members together so that the family becomes the unit of measurement and covariances between family members may be computed. In SAS this is a relatively simple operation although care must be taken to supply labels for the variables that do not exceed the SAS maximum length of eight characters. The SAS script in Appendix

shows the case for twin data, and goes beyond the initial requirement by taking the sex of the twins into account. Five groups are created, being MZ male, DZ male, MZ female, DZ female and opposite DZ. The covariances are computed and output to .dat files which contain the number of observations (Nobservations), the number of input variables (NInput), labels, and the covariance matrices (CMatrix). These .dat files may be used directly in Mx in a diagram, or in a script using the Include statement. Note that the assignment of the twins as 1 or 2 is usually arbitrary for the same sex groups, but in the opposite sex group the male (or female) twin is always first, and the female (or male) twin second. Strictly speaking, when there is no inherent order to the observations the variance-covariance matrix is not the best summary statistic to use. The intraclass correlation is the most appropriate summary for observations that do not have any order; it uses a joint estimate of the variance of twin 1 and twin 2, and partitions this into within pairs and between pairs components. However, the intraclass correlation is more difficult to generalize to the multivariate and multiple classes of relatives situations so we stay with covariance matrices here. Sometimes data on birth order or some other characteristic may be used to distinguish more formally between twin 1 and twin 2 within a pair, thereby giving some rationality to the ordering and use of covariance matrices. Should such an approach be taken, it is necessary to split the DZ opposite sex twin group into two groups according to whether the first twin is female or male. Appendix

shows a SAS macro for creating an Mx .dat file, which fully describes the data: the variable labels, the sample size, the means and covariances. Comments, beginning with ! indicate the date the file was created. The resulting .dat file might look like this:

!
! Mx dat file created by SAS on 03FEB1998
!
Data NInputvars=4 NObservations=844
CMatrix Full
       1.0086       -0.0148       -0.0317       -0.0443 
      -0.0148        1.0169       -0.0062        0.0068 
      -0.0317       -0.0062        0.9342        0.0596 
      -0.0443        0.0068        0.0596        0.9697  
Means
       0.0139       -0.0729        0.0722        0.0159  
Labels T1F1 T1F2 T2F1 T2F2

As will be seen in later chapters, this file is ready for immediate use for drawing path diagrams in the Mx GUI or in an Mx script with the #include command.

Next: 3 Using PRELIS to Up: 2 Using SAS or Previous: 2 Using SAS or Index

Jeff Lessem 2002-03-21