Hausman Tests
Hausman Tests
ASSUMPTION A.1 (ASYMPTOTIC LINEARITY)
ASSUMPTION A.2 (JOINT ASYMPTOTIC NORMALITY)
Hausman tests (Hausman 1978) are tests for econometric model misspecification based on a comparison of two different estimators of the model parameters. The estimators compared should have the properties that (1) under the null hypothesis of correct model specification both estimators are consistent for the “true parameters” of the model (those corresponding to the data generating process), whereas (2) under misspecification (the alternative hypothesis) the estimators should have differing probability limits. The former property ensures that the size of the test can be controlled asymptotically, and the latter property gives the test its power. Heuristically, the key idea is that when the model is correctly specified, the compared estimators will be close to one another, but when the model is misspecified, the compared estimators will be far apart.
A Hausman statistic is constructed as a function of the difference between the two estimators. The sampling distribution of the Hausman statistic determines how big a difference is too big to be compatible with the null hypothesis of correct specification. One performs a Hausman test by comparing the Hausman statistic to a critical value obtained from its sampling distribution, and rejecting the null hypothesis of correct specification if the Hausman statistic exceeds its critical value. The large sample distribution of the Hausman statistic is straightforward to derive; a high-level analysis appears below. This distribution simplifies usefully when one of the compared estimators is efficient under the null, as originally proposed by Jerry Hausman (1978).
Two examples originally considered by Hausman help illustrate the ideas. First, consider estimating the coefficients of a single equation, say the first, of a system of linear simultaneous equations. Provided (among other things) that the system of equations is correctly specified, it is a standard result that both the two-stage least squares (2SLS) and the three-stage least squares (3SLS) estimators of the parameters of this equation are consistent. Further, under standard assumptions, the 3SLS estimator is asymptotically efficient; in particular, it is efficient relative to the 2SLS estimator. The difference between 2SLS and 3SLS will tend to be small in this situation. On the other hand, if one of the equations of the system is misspecified, then 3SLS is generally an inconsistent estimator for the parameters of every equation of the system. If the first equation is not misspecified, then 2SLS remains consistent. The difference between 2SLS and 3SLS may be large in this situation. Thus, by comparing the 2SLS and 3SLS estimators for one or more equations of a system of linear simultaneous equations, one can gain insight into the question of whether some equations of that system may be misspecified.
Another example treated by Hausman involves comparison of two different estimators for the parameters of a panel data regression model. Specifically, it is well known that both the “random effects” and the “fixed effects” panel estimators are consistent under the assumption that the model is correctly specified and that (among other things) the regressors are independent of the “individual-specific effects” (the “random effects” assumption). In this case, the random effects estimator is also asymptotically efficient. The difference between the random effects and the fixed effects estimators will thus tend to be small. On the other hand, if the random effects assumption fails but the model is otherwise correctly specified, then the fixed effects estimator remains consistent, but the random effects estimator is inconsistent. The difference between the random effects and the fixed effects estimators may therefore be large. A comparison of the random and fixed effects estimators can thus shed light on the correctness of the random effects assumption.
The first application of this approach appears to be that of James Durbin (1954), who proposed a test for “errors in variables” in a linear regression, based on a comparison of ordinary least squares (OLS) and instrumental variables (IV) estimators. Under correct specification (no errors in variables), OLS is consistent and efficient, whereas IV is consistent but inefficient. Under misspecification, OLS is inconsistent but IV remains consistent. De-Min Wu (1973) also considered tests based on a comparison of OLS and IV estimators, describing applications to linear simultaneous equations (OLS vs. 2SLS) and dynamic panel models. Alice Nakamura and Masao Nakamura (1981) discuss the relations among the test statistics of Durbin (1954), Wu (1973), and Hausman (1978).
Although Hausman’s initial motivation and examples concerned linear models and the orthogonality assumptions (e.g., independence) that are typically central to identification in econometric models, he particularly emphasizes the generality and unifying nature of the estimator comparison approach. Hausman’s formal results apply not just to linear models, but to maximum likelihood methods generally. As Halbert White (1994, chap. 10.3) discusses, Hausman’s approach further extends to quasi-maximum likelihood methods. This leads to useful specification tests based on two estimators such that (1) under partially correct specification both are consistent, but neither is necessarily efficient; and (2) under misspecification neither is necessarily consistent—it suffices merely that the estimators have differing probability limits.
Hausman’s unifying approach extends even more broadly. As a straightforward illustration, we give a result establishing the large-sample properties of the Hausman statistic based on a comparison of two asymptotically linear estimators. A wide variety of econometric estimators, including quasi-maximum likelihood, method of moments, and empirical likelihood estimators are asymptotically linear.
ASSUMPTION A.1 (ASYMPTOTIC LINEARITY)
Suppose each of two estimators, say t̂ 1n and t̂ 2n, is a random q × 1 vector such that for finite q × 1 nonstochastic vectors Suppose further that for each j = 1, 2 there exists a q × q nonstochastic matrix , finite and nonsingular, and a random q × 1 vector such that
All limits here are taken as the sample size n tends to infinity. Our next assumption ensures the joint asymptotic normality of and
ASSUMPTION A.2 (JOINT ASYMPTOTIC NORMALITY)
For as in Assumption A.1, = 1, 2, suppose that
where J* is a finite 2q × 2q nonstochastic matrix having q× q diagonal blocks and q × q off-diagonal blocks .
Hausman statistics are based on the difference If Assumptions A.1 and A.2 hold and if, as can usually be arranged, the assumption of correct model specification ensures that then it follows straightforwardly that
where
Under mild conditions, it is typically straightforward to obtain a covariance matrix estimator consistent for V*.
Let Vn be such an estimator. If V* is nonsingular, then an asymptotic chi-squared statistic is delivered by the analog of the Wald statistic,
n (θ 1n – θ 2n )′ Vn –1 (θ 1n – θ 2n ) ·
Nevertheless, a common occurrence in applications is that V* fails to be nonsingular, due to the singularity of J*. A straightforward remedy for this is to consider a subvector of or, more generally, a linear combination where S is a known finite nonstochastic k × q matrix, k ≤ q, such that S V* S' is nonsingular. A Hausman statistic can then be computed as the quadratic form
Hn ≡ n (θ 1n – θ 2n ) S′ [S Vn S ]– S′ (θ 1n – θ 2n ).
In testing partially correct specification, S can also play a useful role by selecting or combining only those coefficients consistently estimated under partially correct specification.
We can now formally state the large-sample properties of this Hausman statistic.
PROPOSITION 1
Suppose Assumptions A.1 and A.2 hold and that S is a known finite nonstochastic k × q matrix of full row rank. (i) If the model is correctly specified, and if correct model specification implies that = 0 and where S V* S' is nonsingular, then (ii) If the model is misspecified, and if model misspecification implies that and , where W* is a finite, nonstochastic q × q matrix such that S W* S' is nonsingular, then for any sequence {c n}, c = o (n−1 ), P [Hn > cn ] → 1.
The respective proofs of (i) and (ii) follow those of Theorems 8.6 and 8.16 of White (1994).
Part (i) establishes that under the null hypothesis of correct specification, the Hausman statistic is distributed asymptotically as chi-squared with k degrees of freedom , delivering convenient asymptotic critical values. Part (ii) describes the behavior of the Hausman statistic under the global alternative of model misspecification. This establishes the consistency of the test (power approaching one) for sequences of critical values going to infinity with n (but more slowly than n ), thus driving the probability of Type I error to zero.
An important caveat for part (ii) is the necessity of the requirement that misspecification entails . As Alberto Holly (1982) has pointed out, this can fail for particular types of misspecification in combination with particular choices of compared estimators. The Hausman statistic just given is thus not guaranteed to yield a test consistent against arbitrary model misspecification. Nevertheless, a clever modification of the Hausman statistic proposed by Herman Bierens (1988) gives a variant of the Hausman test that does have this consistency property (see also Bierens 1990).
As mentioned above, the asymptotic distribution simplifies usefully when one of the compared estimators is efficient under correct specification. Specifically, the asymptotic covariance matrix of the estimator difference simplifies to the difference of the asymptotic covariance matrices. With asymptotic linearity, this arises because typically when one of the compared estimators (say θ2n) is asymptotically efficient, then and (see Bates and White 1993). Substitution into the expression for V* above then gives the more convenient expression
The first term is the asymptotic covariance matrix of θ1n; the second term is that of θ2n. A direct benefit of this expression is that it suggests simpler forms for the covariance estimator Vn
The specific form given above for the Hausman statistic is only one of several possible forms. Hausman describes a convenient version for linear regression applications that involves testing whether certain transformations of the original regressors have zero coefficients. Russell Davidson and James MacKinnon (1993) discuss further convenient versions of the Hausman test based on “double-length” regressions. Applying the results of White (1994, chap. 9) yields further variants of the Hausman statistic that have convenient computation properties.
The focus here has been on parametric versions of the Hausman test. The unifying principle of estimator comparison for specification testing also extends to comparing parametric and nonparametric estimators and to comparing nonparametric estimators. White and Yongmiao Hong (1999) give some relevant theory and examples for these cases.
SEE ALSO Fixed Effects Regression; Random Effects Regression; Specification Error
BIBLIOGRAPHY
Bates, Charles, and Halbert White. 1993. Determination of Estimators with Minimum Asymptotic Covariance Matrices. Econometric Theory 9: 633–648.
Bierens, Herman. 1988. A Consistent Hausman-Type Model Specification Test. Vrije Universiteit Faculteit der Economische Wetenschappen Research Memorandum 1987–2 (rev.).
Bierens, Herman. 1990. A Consistent Conditional Moment Test of Functional Form. Econometrica 58: 1443–1458.
Davidson, Russell, and James MacKinnon. 1993. Estimation and Inference in Econometrics. New York: Oxford University Press.
Durbin, James. 1954. Errors in Variables. Review of the International Statistical Institute 22: 23–32.
Hausman, Jerry. 1978. Specification Tests in Econometrics. Econometrica 46: 1251–1272.
Holly, Alberto. 1982. A Remark on Hausman’s Specification Test. Econometrica 50: 749–759.
Nakamura, Alice, and Masao Nakamura. 1981. On the Relationships among Several Specification Error Tests Presented by Durbin, Wu, and Hausman. Econometrica 49: 1582–1588.
White, Halbert. 1994. Estimation, Inference, and Specification Analysis. New York: Cambridge University Press.
White, Halbert, and Yongmiao Hong. 1999. M-Testing Using Finite and Infinite Dimensional Parameter Estimators. In Cointegration, Causality, and Forecasting: A Festschrift in Honor of Clive W. J. Granger, eds. Robert Engle and Halbert White, 326–345. Oxford: Oxford University Press.
Wu, De-Min. 1973. Alternative Tests of Independence Between Stochastic Regressors and Disturbances. Econometrica 41: 733–750.
Jerry A. Hausman
Halbert White