Next: 6 Path Models for Up: 5 Path Analysis and Previous: 2 Tracing Rules for Index

5 Path Models for Linear Regression

**Figure 5.2:** Regression path models with manifest variables.
$\begin{figure} \centerline{\psfig{figure=pathf2.eps,width=5in,clip=t}} \end{figure}$

In this Section we attempt to clarify the conventions, the assumptions and the tracing rules of path analysis by applying them to regression models. The path diagram in Figure 5.2a represents a linear regression model, such as might be used, for example, to predict systolic blood pressure [SBP],

from sodium intake

. The model asserts that high sodium intake is a cause, or direct effect, of high blood pressure (i.e., sodium intake $\rightarrow$ blood pressure), but that blood pressure also is influenced by other, unmeasured (`residual'), factors. The regression equation represented in Figure 5.2a is

$\begin{displaymath} Y_1 = a_1 + b_{11} X_1 + E_1, \end{displaymath}$

(26)

where

is a constant intercept term, $b_{11}$ the regression or `structural' coefficient, and

the residual error term or disturbance term, which is uncorrelated with

. This is indicated by the absence of a double-headed arrow between

and

or an indirect common cause between them [Cov(

) = 0]. The double-headed arrow from

to itself represents the variance of this variable: Var(

) = $s_{11}$ ; the variance of

is Var(

) = $z_{11}$ . In this example SBP is the dependent variable and sodium intake is the independent variable. We can extend the model by adding more independent variables or more dependent variables or both. The path diagram in Figure 5.2b represents a multiple regression model, such as might be used if we were trying to predict SBP (

) from sodium intake (

), exercise (

), and body mass index [BMI] (

), allowing once again for the influence of other residual factors (

) on blood pressure. The double-headed arrows between the three independent variables indicate that correlations are allowed between sodium intake and exercise ( $s_{21}$ ), sodium intake and BMI ( $s_{31}$ ), and BMI and exercise ( $s_{32}$ ). For example, a negative covariance between exercise and sodium intake might arise if the health-conscious exercised more and ingested less sodium; positive covariance between sodium intake and BMI could occur if obese individuals ate more (and therefore ingested more sodium); and a negative covariance between BMI and exercise could exist if overweight people were less inclined to exercise. In this case the regression equation is

$\begin{displaymath} Y_1 = a_1 + b_{11} X_1 + b_{12} X_2 +b_{13} X_3 + E_1. \end{displaymath}$

(27)

Note that the estimated values for

, $b_{11}$ and

will not usually be the same as in equation 5.1 due to the inclusion of additional independent variables in the multiple regression equation 5.2. Similarly, the only difference between Figures 5.2a and 5.2b is that we have multiple independent or predictor variables in Figure 5.2b. Figure 5.2c represents a multivariate regression model, where we now have two dependent variables (blood pressure,

, and a measure of coronary artery disease [CAD],

), as well as the same set of independent variables (case 1). The model postulates that there are direct influences of sodium intake and exercise on blood pressure, and of exercise and BMI on CAD, but no direct influence of sodium intake on CAD, nor of BMI on blood pressure. Because the

variable, exercise, causes both blood pressure,

, and coronary artery disease,

, it is termed a common cause of these dependent variables. The regression equations are

$\begin{displaymath} Y_1 = a_1 + b_{11} X_1 + b_{12} X_2 + E_1 \end{displaymath}$

and

$\begin{displaymath} Y_2 = a_2 + b_{22} X_2 + b_{23} X_3 + E_2. \end{displaymath}$

(28)

Here

and

are the intercept term and error term, respectively, and $b_{11}$ and $b_{12}$ the regression coefficients for predicting blood pressure, and

, $b_{22}$ , and $b_{23}$ the corresponding coefficients for predicting coronary artery disease. We can rewrite equation 5.3 using matrices (see Chapter 4 on matrix algebra),

$\begin{displaymath} \left(\begin{array}{c} Y_1 \ Y_2 \end{array} \right) = \... ...ight) \left(\begin{array}{c} E_1 \ E_2 \end{array} \right) \end{displaymath}$

or, using matrix notation,

$\begin{displaymath} \bf y = \bf a + \bf B \bf x + \bf I \bf e, \end{displaymath}$

where y, a, x, and e are column vectors and B is a matrix of regression coefficients and I is an identity matrix. Note that each variable in the path diagram which has an arrow pointing to it appears exactly one time on the left side of the matrix expression. Figure 5.2d differs from Figure 5.2c only by the addition of a causal path ( $f_{12}$ ) from blood pressure to coronary artery disease, implying the hypothesis that high blood pressure increases CAD (case 2). The presence of this path also provides a link between

and

( $Y_2 \leftarrow Y_1 \leftarrow X_1$ ); this type of process with multiple intervening variables is typically called an indirect effect (of

). Thus we see that dependent variables can be influenced by other dependent variables, as well as by independent variables. Figure 5.2e adds an additional causal path from CAD to blood pressure ( $f_{21}$ ), thus creating a `feedback-loop' (hereafter designated as $\Longleftrightarrow$ ) between CAD and blood pressure. If both

parameters are positive, the interpretation of the model would be that high SBP increases CAD and increased CAD in turn increases SBP. Such reciprocal causation of variables requires special treatment and is discussed further in Chapters 8 and

. Figure 5.2e implies the structural equations

$\begin{displaymath} Y_1 = a_1 + f_{12} Y_2 + b_{11} X_1 + b_{12}X_2 + E_1 \end{displaymath}$

and

$\begin{displaymath} Y_2 = a_2 + f_{21} Y_1 + b_{22} X_2 + b_{23}X_3 + E_2 \end{displaymath}$

(29)

In matrix form, we may write these equations as

$\begin{displaymath}\begin{array}{c} \left(\begin{array}{c} Y_1 \ Y_2 \end{arra... ...t(\begin{array}{c} E_1 \ E_2 \end{array} \right) \end{array} \end{displaymath}$

i.e.,

$\begin{displaymath} \bf y = \bf a + \bf F \bf y + \bf B \bf x + \bf I \bf e \end{displaymath}$

Now that some examples of regression models have been described both in the form of path diagrams and structural equations, we can apply the tracing rules of path analysis to derive the expected variances and covariances under the models. The regression models presented in this chapter are all examples of unstandardized variables. We illustrate the derivation of the expected variance or covariance between some variables by applying the tracing rules for unstandardized variables in Figures 5.2a, 5.2b and 5.2c. As an exercise, the reader may wish to trace some of the other paths. In the case of Figure 5.2a, to derive the expected covariance between

and

, we need trace only the path:

$\begin{eqnarray*} \mbox{(i)} & & X_1 \stackrel{s_{11}}{\longleftrightarrow } X_1 \stackrel{b_{11}}{\longrightarrow } Y_1 \end{eqnarray*}$

yielding an expected covariance of ( $s_{11} b_{11}$ ). Two paths contribute to the expected variance of

$\begin{eqnarray*} \mbox{(i)} & & Y_1 \stackrel{b_{11}}{\longleftarrow } X_1 \s... ...{\longleftrightarrow } E_1 \stackrel{1}{\longrightarrow } Y_1; \end{eqnarray*}$

yielding an expected variance of

of ( $b_{11}^{2}s_{11} + z_{11}$ ). In the case of Figure 5.2b, to derive the expected covariance of

and

, we can trace paths:

$\begin{eqnarray*} \mbox{(i)} & & Y_1 \stackrel{b_{11}}{\longleftarrow } X_1 \st... ...ongleftarrow } X_3 \stackrel{s_{31}}{\longleftrightarrow } X_1, \end{eqnarray*}$

to obtain an expected covariance of ( $b_{11} s_{11} + b_{12} s_{21} +b_{13}s_{31}$ ). To derive the expected variance of

, we can trace paths:

$\begin{eqnarray*} \mbox{(i)} & & Y_1 \stackrel{b_{11}}{\longleftarrow } X_1 \s... ...gleftrightarrow } X_2 \stackrel{b_{12}}{\longrightarrow } Y_1, \end{eqnarray*}$

yielding a total expected variance of ( $b_{11}^{2} s_{11} + b_{12}^{2} s_{22} + b_{13}^{2} s_{33} + 2 b_{11} b_{12} s_{21} + 2 b_{11} b_{13} s_{31} + 2 b_{12} b_{13} s_{32} + z_{11}$ ). In the case of Figure 5.2c, we may derive the expected covariance of

and

as the sum of

$\begin{eqnarray*} \mbox{(i)} & & Y_1 \stackrel{b_{11}}{\longleftarrow } X_1 \s... ...gleftrightarrow } X_3 \stackrel{b_{23}}{\longrightarrow } Y_2, \end{eqnarray*}$

giving [ $b_{11}(s_{21} b_{22} + s_{31} b_{23}) + b_{12}(s_{22} b_{22} + s_{32} b_{23})$ ] for the expected covariance. This expectation, and the preceding ones, can be derived equally (and arguably more easily) by simple matrix algebra. For example, the expected covariance matrix ( ${\bf\Sigma}$ ) for

and

under the model of Figure 5.2c is given as

$\begin{displaymath} \bf {\Sigma} = \bf B \bf S \bf B' + \bf Z, \end{displaymath}$

$\begin{displaymath}\begin{array}{c} = \left(\begin{array}{ccc} b_{11} & b_{12} ... ...c} z_{11} & 0 \ 0 & z_{22} \end{array}\right) \end{array} \end{displaymath}$

in which the elements of B are the paths from the X variables (columns) to the Y variables (rows); the elements of $\bf S$ are the covariances between the independent variables; and the elements of $\bf Z$ are the residual error variances.

Next: 6 Path Models for Up: 5 Path Analysis and Previous: 2 Tracing Rules for Index

Jeff Lessem 2002-03-21