Specification
Specification
STATISTICAL MODELS WITH EXPERIMENTAL DATA
STATISTICAL MODELS WITH OBSERVATIONAL DATA
CONFRONTING SUBSTANTIVE WITH STATISTICAL INFORMATION
The term specification is used in economics to denote the choice of a model in the context of empirical modeling. Unfortunately, the use of the term since the late 1950s (Theil 1957, Leamer 1990) is confusing because different types of models are conflated; the crucial confusion being between a statistical (a set of probabilistic assumptions) and a structural (substantive) model. The distinction between these two types of models is important because they raise very different issues with respect to the premises of inference in empirical modeling (see Spanos 2006a).
THEORY AND STRUCTURAL MODELS
It is widely recognized that most stochastic phenomena (the ones exhibiting chance regularity patterns—see Spanos 1999) are commonly influenced by a very large number of contributing factors, and that explains why theories are often dominated by ceteris paribus clauses. The idea behind a theory is that in explaining the behavior of a variable, say yk, one demarcates the segment of reality to be modeled by selecting the primary influencing factors x k, cognizant of the fact that there might be numerous other potentially relevant factors ξ k (observable and unobservable) that jointly determine the behavior of y k via a theory model :
where h *(.) represents the true behavioral relationship for yk. The guiding principle in selecting the variables in xk is to ensure that they collectively account for the systematic behavior of yk and the unaccounted factors ξ k represent nonessential disturbing influences which could only have a nonsystematic effect on yk. This reasoning transforms (1) into a structural model of the form:
By definition the error term process is:
and represents all unmodeled influences, intended to be a nongeneric white-noise (nonsystematic) stochastic process; that is, {ε (x k,ξ k ), k ∈ N } has (i)mean zero, (ii) variance (iii) uncorrelated over k, and (iv) orthogonal to h (x k ; φ ) for all possible values (x k , ξ k ), ∈ Rx × Rξ Note that (iv) aims to demarcate a “near isolation” condition for the phenomenon of interest.
In summary, a structural model provides an “idealized” substantive description of the phenomenon of interest, in the form of a “nearly isolated” mathematical system (Spanos 1995). The specification of a structural model comprises several choices: (a) the demarcation of the segment of the phenomenon of interest to be captured, (b) the important aspects of the phenomenon to be measured, and (c) the extent to which the inferences based on the structural model are germane to the phenomenon of interest.
The kind of errors one can probe for in the context of a structural model concern the choices (a)-(c), which include the form of h (x k ; φ ) and the circumstances that render the error term ε (x k, ξ k ) potentially systematic, such as the omission of relevant factors, say wk, in ξ k that might have a systematic effect on the behavior of yt (see Spanos 2006b for further discussion).
The problem with (3) is that assumptions (i)-(iv) of the structural error are nontestable because their assessment would involve verifying these assumptions for all possible values (x k, ξ k ) ∈ R x × R ξ . To render them testable one needs to embed this structural into a statistical model with a generic error term; a crucial move that often passes unnoticed. Not surprisingly, the nature of the embedding itself depends crucially on whether the data Z = (z 1, z 2, …, z n ) are the result of an experiment or they are nonexperimental (observational) in nature.
STATISTICAL MODELS WITH EXPERIMENTAL DATA
In the case where one can perform experiments, “experimental design” techniques might allow one to operationalize the “near isolation” condition, including the ceteris paribus clauses, and ensure that the error term is no longer a function of (x k , ξ k ), but takes the generic form
where “IID” stands for Independent and Identically Distributed. For instance, randomization and blocking are often used to “neutralize” the phenomenon from the potential effects of ξ k by ensuring that these uncontrolled factors cancel each other out (Fisher 1935). As a direct result of the experimental “control,” via (4) the structural model (2) is essentially transformed into a statistical model
The statistical error terms in (5) are qualitatively very different from the structural errors in (2) because they no longer depend on (x k, ξ k ) ∈ R x × R ξ . The most important aspect of embedding the structural (2) into the statistical model (5) is that, in contrast to (i)-(iv) for {ε (x k, ξ k ) k ε N }, the probabilistic assumptions ε k ~ IID (0, σ2) concerning the generic statistical error term are rendered testable. That is, (4) has operationalized the “near isolation” condition and the statistical model has been created as a result of the experimental design and control.
A crucial consequence of (4) is that the informational universe of discourse for the statistical model (5) has been demarcated to the probabilistic information relating to the observables Z k : = (yk, Xk ) as described by the joint distribution D (Z 1, Z 2, …, Z T ; φ ). A statistical model can be viewed as a parameterization of the presumed probabilistic structure of the process { Z t, tε T } (Spanos 1986, 1999). This probabilistic structure is chosen so as to render the observed data Z : = (z 1, z 2, …, zn a truly typical realization thereof. This introduces into the empirical modeling a probabilistic perspective which treats the data as realizations of generic stochastic processes devoid of any substantive information.
In contrast to a structural model, once Z t is chosen by some theory or theories, a statistical model relies exclusively on the statistical information in D (Z 1, Z 2, …, Z T ; φ ) that “reflects” the chance regularity patterns exhibited by the data. Hence, a statistical model acquires a life of its own in the sense that it constitutes a self-contained generic generating mechanism defined exclusively in terms of the observables Z k : = (yk, Xk). For example, in the case where h (xx ; φ ) = β 0 + β T 1xt, the error assumptions ε k ~ NIID(0, σ2) give rise to the Gauss Linear model which is given in table 1 (Spanos 1986).
In summary, a statistical model constitutes an “idealized” probabilistic description of a stochastic process {Z t, t ε T } giving rise to data Z in the form of an internally consistent set of probabilistic assumptions chosen to ensure that this data constitute a “truly typical realization” of {Z t , t ∈ T }. Specification for the statistical model refers to choosing a parameterization and an associated complete set of testable probabilistic assumptions constituting the premises of inference. Specification error denotes departures from assumptions [1]-[5].
STATISTICAL MODELS WITH OBSERVATIONAL DATA
This is the case where the observed data on (yk, xk ) are the result of an ongoing data generating process, undisturbed by any experimental control or intervention. In this case the route followed in (4), in order to render the statistical error term (a) free of (x k, ξ k ), and (b) nonsystematic in a statistical sense, is no longer feasible. However, as shown in Spanos (1986), sequential conditioning provides a general way to transform an arbitrary stochastic process {Z , t ε T } into a martingale difference process, a modern form of a white-noise process. This provides the key to an alternative
Table 1–The Gauss Linear (GL) Model |
---|
yt = β 0 + β 1’xt + ut, t ε N, |
[1] Normality: yt ~ N(.,.). |
[2] Linearity: E (yt )β 0 + β ’1xt, is linear in (β 0, β 1), |
[3] Homoskedasticity: Var (yt ) = σ2, free of xt, |
[4] Independence: {yt, t ε N} is an independent process, |
[5] t-invariance: θ = (β 0, β 1, σ2) do not vary with t. |
approach to specifying statistical models in the case of nonexperimental data by replacing the controls and interventions with the choice of the relevant conditioning information set Dt that would render the error term non-systematic—a martingale difference. The technical aspects of specifying statistical models using sequential conditioning are beyond the scope of the present discussion (Spanos 2006b), but an example of how one can specify a statistical model as a reduction from D (Z 1, …, Z T ,φ ) can shed some light on its practical aspects.
The Normal/Linear Regression (LR) model results from a reduction of D (Z 1, …, Z T, φ ) by assuming that {Z t, t ε T } is a NIID process. These reduction assumptions ensure that the appropriate conditioning information set is Dt = {Xt = xt }, giving rise to a statistical error term:
This is analogous to (4) in the case of experimental data, but now the error term has been operationalized by a judicious choice of the conditioning information set Dt = {Xt = xt }. The complete specification of the Linear Regression model is similar to the Gauss Linear model (table 1), but instead of D (yt ; θ), the underlying distribution is D (yt ; ǀXt ; θ )—assumptions [1]-[5] pertain to the probabilistic structure of {(yt ǀXt = xt ), t ε T } (Spanos 1986). In this sense, D (yt ; ǀXt ; θ ) brings to the table the statistical information which supplements, and can be used to assess the appropriateness of, the substantive subject matter information carried by the structural model.
CONFRONTING SUBSTANTIVE WITH STATISTICAL INFORMATION
An important aspect of embedding a structural into a statistical model is to ensure (whenever possible) that the former can be viewed as a reparameterization/restriction of the latter. The structural model is then tested against the benchmark provided by a statistically adequate model. Identification refers to being able to define φ uniquely in terms of θ. Often θ has more parameters than φ and the embedding enables one to test the validity of the additional restrictions, known as overidentifying restrictions (Spanos 1990).
SEE ALSO Specification Tests
BIBLIOGRAPHY
Fisher, Ronald A. 1935. The Design of Experiments. Edinburgh, U.K.: Oliver and Boyd.
Leamer, Edward E. 1990. Specification Problems in Econometrics. In The New Palgrave: A Dictionary of Economics, eds. John Eatwell, Murray Migate, and Peter Newman, 238–245. New York: Norton.
Spanos, Aris. 1986. Statistical Foundations of Econometric Modelling. Cambridge, U.K.: Cambridge University Press.
Spanos, Aris. 1990. The Simultaneous Equations Model Revisited: Statistical Adequacy and Identification. Journal of Econometrics 44: 87–108.
Spanos, Aris. 1995. On Theory Testing in Econometrics: Modeling with Nonexperimental Data. Journal of Econometrics 67: 189–226.
Spanos, Aris. 1999. Probability Theory and Statistical Inference: Econometric Modeling with Observational Data. Cambridge, U.K.: Cambridge University Press.
Spanos, Aris. 2006a. Econometrics in Retrospect and Prospect. In New Palgrave Handbook of Econometrics, vol. 1, eds. Terence C. Mills and Kerry Patterson, 3–58. London: Macmillan.
Spanos, Aris. 2006b. Revisiting the Omitted Variables Argument: Substantive vs. Statistical Adequacy. Journal of Economic Methodology 13: 179–218.
Theil, Henri. 1957. Specification Errors and the Estimation of Economics Relationships. Review of the International Statistical Institute 25: 41–51.
Aris Spanos
specification
spec·i·fi·ca·tion / ˌspesəfiˈkāshən/ • n. an act of describing or identifying something precisely or of stating a precise requirement: give a full specification of the job advertised | there was no clear specification of objectives. ∎ (usu. specifications) a detailed description of the design and materials used to make something. ∎ a standard of workmanship, materials, etc., required to be met in a piece of work: everything was built to a higher specification. ∎ a description of an invention accompanying an application for a patent.