next up previous index
Next: 1 SAS scripts to Up: 2 Continuous Data Analysis Previous: 1 Calculating Summary Statistics   Index

2 Using SAS or SPSS to Summarize Data

The statistical packages SAS and SPSS are probably the most widely-used ways to store data collected in twin studies. In some cases relational databases such as Oracle, DB2, Paradox and Ingres may be used to store data collected from relatives because these offer powerful ways to maintain data in a consistent fashion according to normal form []. Normal form is a way of storing data that avoids duplication of information; this is very important to avoid inconsistencies in the data. The general strategy may then be to use SAS or SPSS to extract the data from the database, to do preliminary data cleaning, to compute scales scores and transform them as necessary, and finally to dump the data in a format suitable for analysis with Mx. Here we discuss the advantages and disadvantages of this approach, and illustrate it with sample SAS and SPSS scripts. By creating intermediate files for Mx to read, we are violating an elementary database principle to keep data in one place and one place only. This principle arises from the observation that almost as soon as there are two copies of data they become inconsistent and the updating chore requires more than double the effort as both sets must be updated and inconsistencies must be resolved. For that reason, it is best to consider the database as a master and to make updates to that dataset and that dataset only. Data analysis then involves creation of the intermediate data files using the same SAS or SPSS script. There are some very important advantages to this procedure. First, we know that the intermediate, file is not going to be updated by anyone else during our analysis -- especially important in a multi-user environment. We want the comparison of models to be conducted on the same data, not on data that have changed from one analysis to the next! Second, the computation time taken to extract the data from the database may be non-trivial and it does not have to be repeated for every analysis.

Subsections
next up previous index
Next: 1 SAS scripts to Up: 2 Continuous Data Analysis Previous: 1 Calculating Summary Statistics   Index
Jeff Lessem 2002-03-21