Least Squares, Two-Stage
Least Squares, Two-Stage
Two-stage least squares (2SLS or TSLS) is an alternative to the usual linear regression technique (ordinary least squares, or OLS), used when the right-hand side variables in the regression are correlated with the error term. 2SLS uses additional information to compute asymptotically unbiased coefficients (i.e., approximately unbiased in large samples), in contrast to the OLS coefficients, which are biased even in large samples.
In the regression model with k explanatory variables and n observations
yi = bi X i 1 + b2 X i 2 + … + bk Xik + ei
OLS will produce biased coefficients if any of the x variables is correlated with the error e. Such correlation may occur if any of the x variables is measured with error, if relevant variables are left out of the specification, or if any x variables are endogenous (determined in part by y.)
Suppose the investigator has a list of q (q ≥ k ) instrumental variables, z 1, …, zq, where to qualify as an instrument each z must be correlated with one or more x variables and must be uncorrelated with the error term. In the first-stage regressions, each x is regressed on the instruments and the fitted value x̂ is computed. By construction, x̂ will be a proxy for x that is uncorrelated with the error term e. In this way, x̂ is purged of the correlation that made x unsuitable for use in the regression. In the second stage regression, y is regressed on the set of fitted x̂’s. The estimated coefficients from this second stage are the two-stage least squares estimates of b. Appropriate adjustments are made to compute standard errors and other statistics associated with the regression, as the statistics reported directly in the second-stage regression are not valid.
In the simple case of one explanatory variable and one instrument, the formulas for the OLS and 2SLS coefficients are given by, respectively
Note that modern software computes 2SLS coefficients directly, rather than actually computing two stages of regression.
The difficult task in using 2SLS is to specify a list of instruments known a priori to be uncorrelated with the error term. In a linear simultaneous equation model, all exogenous variables (those variables not determined in the model) are candidates to serve as instruments. In particular, a variable may serve both as an explanatory variable and as an instrument (x and z) if it is uncorrelated with the error. Identification requires that the number of instruments, q, be equal to or greater than the number of right-hand side variables, k.
2SLS is an instrumental variable (IV) estimator and the terms 2SLS and IV are often used interchangeably. Estimators closely related to 2SLS include the generalized method of moments (GMM) for nonlinear estimation, three-stage least squares (3SLS) for estimation of systems of equations, and limited-information maximum likelihood (LIML).
2SLS is a well-established technique, particularly in economics. Difficulties due to weak instruments, where the correlation between x and z is very low, constitute an ongoing area of investigation.
SEE ALSO Error-correction Mechanisms; Least Squares, Ordinary; Least Squares, Three-Stage; Properties of Estimators (Asymptotic and Exact); Regression
BIBLIOGRAPHY
Wooldridge, Jeffrey M. 2005. Introductory Econometrics: A Modern Approach, 3rd ed. Mason, OH: Thomson/SouthWestern.
Richard Startz