Next: 1 SAS scripts to
Up: 2 Continuous Data Analysis
Previous: 1 Calculating Summary Statistics
  Index
The statistical packages SAS and SPSS are probably the most
widely-used ways to store data collected in twin studies. In some
cases relational databases such as Oracle, DB2, Paradox and Ingres may
be used to store data collected from relatives because these offer
powerful ways to maintain data in a consistent fashion according to
normal form []. Normal form is a way of storing data that
avoids duplication of information; this is very important to avoid
inconsistencies in the data. The general strategy may then be to use
SAS or SPSS to extract the data from the database, to do preliminary
data cleaning, to compute scales scores and transform them as
necessary, and finally to dump the data in a format suitable for analysis
with Mx. Here we discuss the advantages and disadvantages of this
approach, and illustrate it with sample SAS and SPSS scripts.
By creating intermediate files for Mx to read, we are violating an
elementary database principle to keep data in one place and one place
only. This principle arises from the observation that almost as soon
as there are two copies of data they become inconsistent and the
updating chore requires more than double the effort as both sets must
be updated and inconsistencies must be resolved. For that reason, it
is best to consider the database as a master and to make updates to
that dataset and that dataset only. Data analysis then involves
creation of the intermediate data files using the same SAS or SPSS
script. There are some very important advantages to this procedure.
First, we know that the intermediate, file is not going to be updated
by anyone else during our analysis -- especially important in a
multi-user environment. We want the comparison of models to be
conducted on the same data, not on data that have changed from one
analysis to the next! Second, the computation time taken to extract
the data from the database may be non-trivial and it does not have to
be repeated for every analysis.
Subsections
Next: 1 SAS scripts to
Up: 2 Continuous Data Analysis
Previous: 1 Calculating Summary Statistics
  Index
Jeff Lessem
2002-03-21