5 Development of Statistical Methods

Next: 2 Data Preparation Up: 6 The Context of Previous: 4 Integration of the Index

5 Development of Statistical Methods

Underlying all of the later developments of the biometrical-genetical, path-analytic and factor-analytic research programs has been a concern for the statistical problems of estimation and hypothesis-testing. It is one thing to develop models; to attach the most efficient and reliable numerical values to the effects specified in a model, and to decide whether a particular model gives an adequate account of the empirical data, are completely different. All three traditions that we have identified as being relevant to our work rely heavily on the statistical concept of likelihood, introduced by Ronald Fisher as a basis for developing methods for parameter estimation and hypothesis testing . The approach of ``maximum likelihood'' to estimation in human quantitative genetics was first introduced in a landmark paper by Jinks and Fulker (1970) in which they first applied the theoretical and statistical methods of biometrical genetics to human behavioral data. Essential elements of their understanding were that:

complex models for human variation could be simplified under the assumption of polygenic inheritance
the goodness-of-fit of a model should be tested before waxing lyrical about the substantive importance of parameter estimates
the most precise estimates of parameters should be obtained
possibilities exist for specifying and analyzing gene action and genotype $\times$ environment interaction

It was the confluence of these notions in a systematic series of models and methods of data analysis which is mainly responsible for breaking the intellectual gridlock into which human behavioral genetics had driven itself by the end of the 1960's. Essentially the same statistical concern was found among those who had followed the path analytic and factor analytic approaches. Rao, Morton, and Yee (1974) used an approach close to maximum likelihood for estimation of parameters in path models for the correlations between relatives, and earlier work on the analysis of covariance structures by Karl Jöreskog had provided some of the first workable computer algorithms for applying the method of maximum likelihood to parameter estimation and hypothesis-testing in factor analysis. Guided by Jöreskog's influence, the specification and testing of specific hypotheses about factor rotation became possible. Subsequently, with the collaboration of Dag Sörbom, the analysis of covariance structures became elaborated into the flexible model for Linear Structural Relations (LISREL) and the associated computer algorithms which, over two decades, have passed through a series of increasingly general versions. The attempts to bring genetic methods to bear on psychological variables naturally led to a concern for how the psychometrician's interest in multiple variables could be reconciled with the geneticist's methods for separating genetic and environmental effects. For example, several investigators (Vandenberg, 1965; Loehlin and Vandenberg, 1968; Bock and Vandenberg, 1968) in the late 1960's began to ask whether the genes or the environment was mainly responsible for the general ability factor underlying correlated measures of cognitive ability. The approaches that were suggested, however, were relatively crude generalizations of the classical methods of univariate twin data analysis which were being superseded by the biometrical and path analytic methods. There was clearly a need to integrate the model fitting approach of biometrical genetics with the factor model which was still the conceptual framework of much multivariate analysis in psychology. In discussion with the late Owen White, it became clear that Jöreskog's analysis of covariance structures provided the necessary statistical formulation. In 1977, Martin and Eaves reanalyzed twin data on Thurstone's Primary Mental Abilities using their own FORTRAN program for a multi-group extension of Jöreskog's model to twin data and, for the first time, used the model fitting strategy of biometrical genetics to test hypotheses, however simple, about the genetic and environmental causes of covariation between multiple variables. The subsequent wide dissemination of a multi-group version of LISREL (LISREL III) generated a rash of demonstrations that what Martin and Eaves had achieved somewhat laboriously with their own program could be done more easily with LISREL (Boomsma and Molenaar, 1986, Cantor, 1983; Fulker et al., 1983; Martin et al., 1982; McArdle et al, 1980). After teaching several workshops and applying LISREL to everyday research problems in the analysis of twin and family data, we discovered that it too had its limitations and was quite cumbersome to use in several applications. This led to the development of Mx, which began in 1990 and which has continued throughout this decade. Initially devised as a combination of a matrix algebra interpreter and a numerical optimization package, it has simplified the specification of both simple and complex genetic models tremendously. In the 1980's there were many significant new departures in the specification of multivariate genetic models for family resemblance. The main emphasis was on extending the path models, such as those of Cloninger et al., (1979a,b) to the multivariate case (Neale & Fulker, 1984; Vogler, 1985). Much of this work is described clearly and in detail by Fulker (1988) . Many of the models described could not be implemented with the methods readily available at the time of writing of the first edition this book. Furthermore, several of the more difficult models were not addressed in the first edition because of the lack of suitable data. Since that time many of the problems of specifying complex models have been solved using Mx, and this edition presents some of these developments. In addition, several research groups have now gathered data on samples large and diverse enough to exploit most of the theoretical developments now in hand. The collection of large volumes of data in a rich variety of twin studies from around the world in the last ten years, coupled with the rocketing growth in the power of micro-computers, offer an unprecedented opportunity. What were once ground-breaking methods, available to those few who knew enough about statistics and computers to write their own programs, can now be placed in the hands of teachers and researchers alike.

Next: 2 Data Preparation Up: 6 The Context of Previous: 4 Integration of the Index

Jeff Lessem 2002-03-21