Structural Equation Models
Structural Equation Models
COMPONENTS OF A STRUCTURAL EQUATION MODEL
Structural equation modeling (SEM) is a general method for modeling systems of effects among three or more variables. Structural equation models can vary greatly in complexity. At its base, SEM is an extension of linear regression (or, linear regression is a special case of SEM) in which a number of regression equations are solved simultaneously. This process allows for the explicit modeling of many quantities not typically a part of linear regression, including the covariances (or absence thereof) among predictors, the residual variances of the endogenous (predicted) variables, and measurement error in both the exogenous and endogenous variables.
COMPONENTS OF A STRUCTURAL EQUATION MODEL
A full structural equation model consists of a measurement part and a structural part. It is the measurement part that allows for the modeling of measurement error. As in factor analysis, the modeler typically assumes that some latent construct is measured by its influence on one or more (usually at least three, for model identification purposes) observed variables. A latent construct is some unmeasured, and perhaps unmeasurable, variable of substantive interest. In a traditional measurement model, the latent construct is established by its effects on the observed indicator variables; for example, self-esteem as a construct might be measured by several items on a questionnaire covering a range of related content. Each item is modeled as being fully determined by two quantities: the self-esteem latent construct and an item-specific residual, or disturbance, usually considered uncorrelated with the residuals of the other items and other variables in the model. Thus, the observed covariances among the items are fully determined by their common cause, the latent construct; alternatively, the latent construct is defined by that portion of the variance of the indicators that is in common.
MEASUREMENT MODELS
Such a model, with one or more latent constructs, each with its own indicators, can stand by itself as a confirmatory factor analysis (CFA). CFA is a special case of SEM. In CFA, the measurement models for the constructs of interest are estimated, and the covariances among the constructs are freely estimated; that is, there are no hypothesized constraints on relations among the latent constructs. Importantly, these covariances are estimated incorporating correction for measurement error. As the latent variable is that variance which is common among the observed indicators, the indicator-specific residual variances (error) have been removed from consideration. Many social science researchers conduct a CFA on their data before moving to a full structural equation model.
A CFA is distinct from an exploratory factor analysis (EFA) in that, for CFA, the latent constructs are conceptualized a priori, as are the patterns of relations between constructs and indicators. EFA is statistically very similar to CFA but has a different purpose. If the researcher does not have an a priori model, or, perhaps, the CFA results in a poor fit to the data, EFA might be used. In EFA, all observed indicators are modeled as having been caused by all latent constructs (of a number specified by the researcher, often based on empirical aspects of the data), with only certain identifying restrictions. In contrast, most CFAs in the social sciences model each indicator as being caused by exactly one latent construct (though there is no statistical necessity to do this), resulting in a more clear definition of the latent construct. Latent constructs in EFA can be more difficult to describe, as the description is determined by the researcher’s interpretation of the pattern of loadings (regression coefficients) relating each indicator to the construct.
STRUCTURAL MODELS
The second part of a full structural equation model is the structural part: the model of the relations among the latent constructs. Each construct can be modeled as having multiple effects on and/or multiple causes from other constructs. Thus, each endogenous, or downstream, construct is the outcome variable in a multiple regression equation and may well be a predictor variable in one or more multiple regression equations for other outcomes. These relations are typically represented pictorially in a diagram with directional arrows representing modeled effects, and curved, bidirectional arrows representing covariances among the exogenous variables (as in path analysis; indeed, path analysis is a special case of SEM in which there is no measurement model, for the constructs of interest are the measured variables). In most cases, all the exogenous variables are modeled with all possible correlations among them represented. A failure to include such a correlation would in effect be a hypothesis that that correlation equals zero, which is rarely applicable to exogenous variables. Observed variables (indicators) are usually diagrammed as labeled rectangles, latent variables as ovals.
As in the measurement part, residual variance is explicitly modeled. In all but rare cases, each endogenous variable has an exogenous disturbance associated with it— a latent variable that is the “cause” (actually the pooled causes) of all variance not determined by the regression relations.
MODEL ESTIMATION
When the full model has been established, the next step is to estimate the coefficients for each covariance, effect, and loading (the measurement coefficients). Each effect and loading estimate is a partial regression coefficient: the regression of the specified endogenous variable on the specified “upstream” variable, controlling for the other variables that have effects leading to the endogenous variable. And thus the coefficients are interpretable as partial regression coefficients: the change in the downstream variable per unit change in the upstream variable, holding all other variables constant. Certain hypotheses involving the path coefficients can be tested as in regression, such as the null hypothesis that the path coefficient equals zero, which is tested by the ratio of the coefficient to its standard error.
Structural equation modeling software routinely calculates a number of indices of fit of the model. A fitted model allows for the calculation, from the various path coefficients and estimated variances, covariances, and residual covariances, of a model-implied variance/covariance matrix of the original variables—that is, a covariance matrix that is consistent with the fitted model. Broadly speaking, the fit of the model is an assessment of how well the model, with its estimated coefficients, implies a covariance matrix that matches the original matrix from the data. If there is no significant discrepancy, as measured by a chi-squared statistic, then it may be concluded that the path model is consistent with the data. Note that this does not necessarily indicate that the model is an accurate depiction of causation in reality, but only that it is not inconsistent with reality as indicated by the covariance matrix.
There has been, however, much debate over the utility of the chi-squared statistic. It is widely agreed that, assuming an adequate sample size, a nonsignificant chi-square results in a failure to reject the model. However, there may be cases where the chi-squared statistic is sensitive to small deviations between the actual and implied covariance matrices that are not of practical importance to the researcher; this is especially true when the sample size is large. As a result, numerous statistics have been developed to assess approximate or close fit. Among the more prominent of these are the comparative fit index, the Tucker-Lewis index, and the root mean squared error of approximation.
HYPOTHESIS TESTING
The test of model fit is one of the primary results of estimating a model. Other hypotheses of interest in SEM frequently involve constraints on the structural coefficients: for example, that two coefficients are equal to each other, or that a set of three coefficients are all equal to zero. These can readily be tested in structural equation modeling software by the estimation of nested models. In this situation, the fit of a full model (without the constraints) is compared with a restricted model (with the constraints applied in the estimation process). Two such models are nested if the restricted model can be created strictly by imposing constraints on the full model. If the full model fits the data well, then the difference between the chi-squared statistics for the two models is itself distributed as a chi-squared statistic, with degrees of freedom equal to the number of constraints applied. A significant chi-squared statistic indicates that the restricted model fits significantly less well than the full model.
SEE ALSO Factor Analysis; Hypothesis and Hypothesis Testing; Hypothesis, Nested; Least Squares, Ordinary; Linear Regression; Linear Systems; Methods, Quantitative; Nonlinear Regression; Nonlinear Systems; Regression; Regression Analysis; Statistics in the Social Sciences
BIBLIOGRAPHY
Bollen, Kenneth A. 1989. Structural Equations with Latent Variables. New York: Wiley.
Loehlin, John C. 1998. Latent Variable Models: An Introduction to Factor, Path, and Structural Analysis. 2nd ed. Hillsdale, NJ: Erlbaum.
Marsh, Herbert W., and John Balla. 1994. Goodness of Fit in Confirmatory Factor Analysis: The Effects of Sample Size and Model Parsimony. Quality and Quantity 28: 185–217.
Patrick S. Malone