next up previous index
Next: 4 Preparing Raw Data Up: 3 Ordinal Data Analysis Previous: 4 Terminology for Types   Index


5 Using PRELIS with Ordinal Data

Here we give a PRELIS script to read only two from a long list of psychiatric diagnoses, coded as 1 or 0 in these data.
Diagnoses and age MZ twins: VARIABLES ARE:                                     
 DEPLN4 DEPLN2 DEPLN1 DEPLB4 DEPLB2 DEPLB1 GADLN6 GADLN1                       
 GADLB6 GADLB1 GAD88B GAD88N PANN PANB PHON PHOB ETOHN 
 ETOHB ANON ANOB BULN BULB DEPLN4T2 DEPLN2T2 DEPLN1T2 
 DEPLB4T2 DEPLB2T2 DEPLB1T2 GADLN6T2 GADLN1T2 GADLB6T2 
 GADLB1T2 GAD88BT2 GAD88NT2 PANNT2 PANBT2 PHONT2 PHOBT2 
 ETOHNT2 ETOHBT2 ANONT2 ANOBT2 BULNT2 BULBT2/
 FORMAT IN FULL IS:                                                             
 (2X, F8.2,F1.0, 43(1X,F1.0)                                                    
                                                                               
Diagnoses and age MZ twins
DA NI=3 NO=0 
LA; DOB DEPLN4 DEPLN4T2
RA FI=DIAGMZ.DAT FO
(2X, F8.2,F1.0, 43x,F1.0)
OR DEPLN4-DEPLN4T2
OU MA=PM SM=DEPLN4MZ.COR SA=DEPLN4MZ.ASY PA
                                                                               
Diagnoses and age DZ twins
DA NI=3 NO=0 
LA; DOB DEPLN4 DEPLN4T2
RA FI=DIAGdZ.DAT FO
(2X, F8.2,F1.0, 43x,F1.0)
OR DEPLN4-DEPLN4T2
OU MA=PM SM=DEPLN4dZ.COR SA=DEPLN4dZ.ASY PA
Note that again we have used the FORTRAN format to control which variables are read. One key difference from the continuous case is the use of MA=PM, which requests calculation of a matrix of polychoric, polyserial and product moment correlations. The program uses product moment correlations when both variables are continuous, a polyserial (or biserial) when one is ordinal and the other continuous, and a polychoric (or tetrachoric) when both are ordinal. Running the script produces four output files DEPLN4MZ.COR, DEPLN4MZ.ASY, DEPLN4DZ.COR and DEPLN4DZ.ASY which may be read directly into Mx using PMatrix and ACov commands. Notice that we have `stacked' two scripts in one file, one to read and compute statistics from the MZ data file (FI=DIAGMZ.DAT) and a second to do the same thing for the DZ data. Also notice that the SM command is used to output the correlation matrix and SA is to save the asymptotic weight matrix. In fact, PRELIS saves the weight matrix multiplied by the sample size which is what Mx expects to receive when the ACov command is used. The PA command requests that the asymptotic weight matrix itself be printed in the output. However, PRELIS saves this file in a binary format which must be converted to ASCII for use with Mx. The utility bin2asc, supplied with PRELIS, can be used for this purpose. In the PRELIS output, there are a number of summary statistics for continuous variables (means and standard deviations, and histograms) and frequency distributions with bar graphs, for the ordinal variables. To provide the user with some guide to the origin of statistics describing the covariance between variables, PRELIS prints means and standard deviations of continuous variables separately for each category of each pair of ordinal variables, and contingency tables between each ordinal variables. Towards the end of the output there is a table printed with the following format:
                                       TEST OF MODEL                           
                     CORRELATION  CHI-SQU.  D.F. P-VALUE                       
                     ___________  ________  ____ _______                       
                                                                               
  DEPLN4 VS.      DOB  -.233 (PS)   5.067     1     .024                       
DEPLN4T2 VS.      DOB   .010 (PS)   6.703     1     .010
There are two quite different chi-squared tests printed on the output. The first, under TEST OF MODEL is a test of the goodness of fit of the bivariate normal distribution model to the data. In the case of two ordinal variables with $r$ and $c$ categories in each, there are $rc - r - c$ df as described in expression 2.5 above. Likewise there will be $2r-3$ df for the continuous by ordinal statistics, as described in expression 2.6. If the $p$-value reported by PRELIS is low (e.g. $<.05$), then concern arises about whether the bivariate normal distribution model is appropriate for these data. For a polyserial correlation (correlations between ordinal and continuous variables), it may simply be that the continuous variable is not normally distributed, or that the association between the variables does not follow a bivariate normal distribution. For polychoric correlations, there is no univariate test of normality involved, so failure of the model would imply that the latent liability distributions do not follow a bivariate normal. Remember however that significance levels for these tests are not often the reported $p$-value, because we are performing multiple tests. If the tests were independent, then with $n$ such tests the $\alpha$ significance level would not be the reported $p$-value but $1-(1-p)^n$. Therefore concern would arise only if $p$ was very small and a large number of tests had been performed. In our case, the tests are not independent because, for example, the correlation of A and B is not independent of the correlation of A and C, so the attenuation of the $\alpha$ level of significance is not so extreme as the $1-(1-p)^n$ formula predicts. The amount of attenuation will be application specific, but would often be closer to $1-(1-p)^n$ than simply to $p$. The second chi-squared statistic printed by PRELIS (not shown in the above sample of output) tests whether the correlation is significantly different from zero. A similar result should be obtained if the summary statistics are supplied to Mx, and a chi-squared difference test (see Chapter [*]) is performed between a model which allows the correlation to be a free parameter, and one in which the correlation is set to zero. The use of weight matrices as input to Mx is described elsewhere in this book. Here we have described the generation of a weight matrix for a correlation matrix, but it is also possible to use weight matrices for covariance matrices[*]. Both methods are part of the asymptotically distribution free (ADF) methods pioneered by Browne (1984). It is not yet clear whether maximum likelihood or ADF methods are generally better for coping with data that are not multinormally distributed; further simulation studies are required. The ADF methods require more numerical effort and become cumbersome to use with large numbers of variables. This is so because the size of the weight matrix rapidly increases with number of variables. The number of elements on and below the diagonal of a matrix is a triangular number given by $k(k+1)/2$. The number of elements in this weight matrix is a triangular number of a triangular number, or

\begin{displaymath}
\frac{k^4+2k^3+3k^2+2k}{8}\end{displaymath}

In the case of correlation matrices, the number of elements is somewhat less, but still increases as a quadratic function:

\begin{displaymath}\frac{k^4-2k^3+3k^2-2k}{8}\end{displaymath}

As a compromise when the number of variables is large, Jöreskog and Sörbom suggest the use of diagonal weights, i.e. just the variances of the correlations and not their covariances. However, tests of significance are likely to be inaccurate with this method and estimates of anything other than the full or true model would be biased.
next up previous index
Next: 4 Preparing Raw Data Up: 3 Ordinal Data Analysis Previous: 4 Terminology for Types   Index
Jeff Lessem 2002-03-21