Next: 5 Summary Up: 2 Data Preparation Previous: 5 Using PRELIS with Index

4 Preparing Raw Data

Almost by definition, raw data does not need to be prepared for analysis. However, computer programs rarely communicate with each other without some form of translation of data format, and getting data out of datasets maintained in popular statistical packages such as SAS or SPSS and into Mx is no exception. In this section we briefly describe SAS and SPSS scripts that output data into a file suitable for Mx to read. Mx has two main ways to read individual scores. First, and most straightforward, is `rectangular' format, with one case per line, with variables separated by one or more spaces. A case is a collection of possibly correlated observations, such as several variables assessed on an individual, or on both members of a twin pair, or on a whole family. Because family members correlate, it is necessary to consider the whole family as a `case'. Separate cases are assumed to be uncorrelated, which is important for statistical purposes. Certain new methods available in programs such as Sudaan, SAS proc mixed, and Stata make it possible to account for some correlation between different cases, usually when data are grouped, e.g., subjects in the same school. These methods can prove useful for running standard statistical analyses at the individual level (multiple regression, survival analysis) by taking into account the covariation between family members. However, they do not help with the preparation of data for modeling genetic and environmental factors which is the primary objective here. The default code that Mx recognises as indicating missing data is a dot `.' which is the same as SAS and SPSS. A sample SAS script to produce rectangular data is shown in Appendix

. Mx's missing command can be used to declare a different string as the missing value, and it is important to note that this is a string and not a numeric value, as 1.0 and 1.00 will be considered to be different. The second main format for raw data that Mx accepts is variable length, or `vl'.

Next: 5 Summary Up: 2 Data Preparation Previous: 5 Using PRELIS with Index

Jeff Lessem 2002-03-21