Cross–Section Analysis
Cross–Section Analysis
Empirical analysis is concerned with the establishment of quantitative or qualitative relations between observable variables. From a temporal point of view two kinds of data are used in empirical analysis—cross-section data and time series data. Cross-section data are observations on variables at a point of time, whereas time series data are observations covering several time periods. Sometimes the two kinds of data are used together to overcome some specific difficulties.
The unit of observation in a cross section is generally, although not necessarily, elementary, such as a firm or a consumer; the unit of observation in a time series is generally an aggregate. The empirical association between the type of data and the level of aggregation contributes to differences in the results obtained with the two kinds of data.
While cross-section analysis is used in many of the social sciences, this article focuses on applications of cross-section analysis in economics. [For other applications, seeSurvey analysis.]
Some of the more important areas of economics in which cross-section data have been used are the estimation of Engel curves (Liviatan 1964; Prais & Houthakker 1955), the estimation of consumption functions (Friedman 1957), the estimation of production functions (Bronfenbrenner & Douglas 1939), and the estimation of investment functions (Kuh 1963).
Reasons for using cross-section data. There are several reasons why cross-section data are often superior to time series data for estimating economic relations.
(1) Cross-section data contain large variations in some variables whose variations over time are only moderate and often subject to trend. For instance, there is a much larger variation in income among consumers at a particular point of time than in average per capita income over time. Likewise, there are wide variations in productive capacity and sales among firms in a cross section, but only relatively small variations in these variables in a time series.
(2) The size of a cross-section sample of data can usually be increased enough to make sampling variance relatively negligible (Kuh 1963).
(3) Multicollinearity among variables in a cross section is often less acute than among corresponding variables in a time series.
(4) The problem of interdependent disturbances, which frequently arises in the analysis of time series because of trends in time series data, usually does not arise in the analysis of cross-section data, simply because the order of the observations has no meaning.
(5) In some cases cross-section data are more reliable; and the variables that can be measured in a cross section often correspond more closely to the variables defined and studied in economic theory. For instance, cross-section data on consumer budgets furnish a precise account of consumption by commodity, and such data can be more useful than time series data for estimating demand functions. On the other hand, some variables are subject to larger errors in a cross section, although the errors are of a different nature than those encountered in a time series. For example, the discrepancy between observed income and permanent income is larger in a cross section than in an aggregate time series (Friedman 1957). This problem is discussed below.
(6) The distinction between the individual economic unit and the market, which is basic in economic analysis, leads to different classifications of variables at the microeconomic and macroeconomic levels of analysis. For instance, prices can be assumed to be given (exogenous) for a consumer, but they should be considered as dependent (endogenous) variables when markets are analyzed. Cross-section analysis usually deals with the behavior of microeconomic units; therefore it can often proceed on the basis of a simpler economic model than time series analysis of macroeconomic data.
It should be noted that cross-section analyses had been undertaken long before the statistical advantages of cross-section data were recognized; and in some cases, such as the study of Engel curves, cross-section analyses also preceded the rationale that was to be provided for them by economic theory (Staehle 1934–1935; Stigler 1954).
The scope of cross-section analysis. The explanatory variables appearing in economic relations can be divided into three groups: (1) variables that vary in a cross section and over time; (2) variables that are stable in a cross section and vary over time; (3) variables that vary in a cross section and are stable over time.
It is the existence of the first group that makes cross-section analysis valuable in economics. For instance, income is an important variable in consumption functions, and its impact on consumption can be learned from cross-section analysis. If income did not vary in a cross section, cross-section analysis of consumption functions would be impossible; if income did not vary over time, the results of cross-section analysis would be of little interest.
The existence of the second group implies that cross-section analysis by itself is not sufficient to explain the variations of many economic variables. For example, variations in consumption resulting from changes in income can be determined by cross-section analysis, since income varies in a cross section; but variations in consumption resulting from changes in prices cannot be determined by cross-section analysis, since prices generally do not vary in a cross section. Thus, to obtain complete economic relations, variables in the second group must be included in the relations, and the coefficients of these variables must be estimated by time series analysis.
The variables in the third group may be very important in explaining variations of economic variables in a cross section. For instance, age and sex and their interaction may explain more of the variations in consumption of certain commodities among individuals than does income. Yet they may be relatively unimportant variables to consider in most economic decisions or predictions, since neither the average age nor the sex composition of the population varies much over time.
The variables in the third group are largely non-economic, that is, they are not endogenous variables in current economic theory. Among the important variables in this group are those measuring the uncertainty faced by decision-making units, particularly firms. According to the theory of the firm facing no uncertainty, the amount of a good supplied by a firm and the amounts of productive factors demanded by a firm depend on the price of the good and on the prices of the productive factors.
Given input and output prices, the firm should choose the input-output configuration that maximizes its profits. But in many empirical analyses of cross-section data, it is found that prices explain only a small proportion of variations in input demands and output supplies among firms. The unexplained variations may be attributable to differences in the degree of uncertainty among firms or to differences in the response to uncertainty among firms. Firms may be certain of the prices that will prevail when they execute their decisions, but they may be uncertain of the quantity of output that can be sold or the amounts of inputs that can be purchased at those prices.
Given uncertainty of this kind, a firm may deviate from its profit maximizing input-output configuration, so as to avoid partially the costs that would be incurred if plans cannot be realized, e.g., the costs of undesired inventory accumulations. In such cases a firm might consider all input-output configurations that yield profits greater than a pre-assigned fraction, say 95 per cent, of maximum profits and select one of these configurations for final execution. The range of acceptable configurations may be very large. Hence, firms may differ considerably in their decisions, the differences resulting from variations in uncertainty and in the response to uncertainty among firms.
From these considerations it is evident that economic variables may well explain only a small fraction of the variations of a dependent variable in a cross-section relation and that other variables that appear or should appear in the relation will per-form the major explanatory role. Thus, there is no a priori requirement that cross-section relations produce high degrees of explanation or that, if they do, their explanatory powers result from the economic variables included (Grunfeld & Griliches 1960). Yet when all individuals are taken together, the noneconomic variables generally offset each other, and economic variables, such as prices, may turn out to be the important explanatory variables.
The problem of multiperiod relations. In general, the time horizon of economic decisions extends beyond a single period. Consequently, “true” economic relations contain variables of several periods; but in cross-section analysis one can usually observe only the current variables, and the problem of inferring a complete relation from such partial information arises. To illustrate, according to the permanent income hypothesis the consumption function is a relation between permanent consumption and permanent income. While permanent consumption may be approximately equal to observed consumption, permanent income is an average of incomes of several periods. Hence, consumption in any given year will depend, apart from errors, on a stream of income which, except for its current component, is unobserved. Similarly, in deciding whether to invest in durable assets, a firm takes into account the profits that the assets will yield not only in the year of the investment but also in future years. Thus, in order to ascertain empirically the determinants of business investment, it is necessary to take into account firms’ expectations of the future, which are unobservable.
The nature of the problem can also be seen by expressing an economic relation as
f(x 0, x 1,···, x t, ···) = 0,
where xt is the vector of variables from period t that enter the relation. The problem discussed above exists when observations are available only for a particular xt (as is the case in cross-section analysis) and when f is not separable with respect to that xt. In such cases the estimated cross-section relation is subject to bias resulting from the omission of the unobserved variables. This is all that can be said at this level of generality.
More specific conclusions can be arrived at when the underlying theory is more specific with respect to the economic relation to be estimated and with respect to the relationships between the observed and the unobserved variables. For instance, in the case of the permanent income hypothesis the stream of income over time in the true relation is replaced by one unobserved variable — permanent income. For purposes of estimation it is assumed that observed income measures permanent income with an unsystematic error. With specifications of the properties of the error term, the estimation problem reduces to a regression problem with errors in the variables, and the appropriate statistical methods are applied [seeLinear hypotheses, article onregression].
It should be noted that if the measurement errors are unsystematic, then aggregating the observations will reduce the measurement errors. Thus, while measurement errors are a serious problem in estimating most multiperiod relations with cross-section data, they usually do not present much of a problem in estimating such relations with aggregate time series data.
In studying multiperiod relations, considerable information can be gained by taking repeated observations on microeconomic units over time. Liviatan (1963) used such information to perform a rich variety of tests of the permanent income hypothesis.
The study of multiperiod relations is further complicated by the existence of uncertainty in the decisions of microeconomic units. Consumers and firms do not have complete information on the future values of the variables exogenous to them. In such cases observed values of variables cannot be identified with expectations, and the utilization of repeated observations does not solve this problem. Assumptions must be made regarding the formation of expectations and regarding the behavior of individuals under uncertainty. Some of the consequences of uncertainty were noted in the preceding section.
Estimation. An initial step in estimating cross-section relations is choosing the explanatory variables to be included. Returning to the classifications noted above, variables in the second group obviously cannot be included, since they do not vary in a cross section. Variables in the first group should be included for two reasons. First, their variations in the cross section may contribute to explaining the variations in the dependent variable of the relation. Second, since they vary over time, cross-section estimates of their coefficients will be useful in making intertemporal forecasts with the estimated relation. Variables in the third group, which are specific to the cross section, may be of little interest for forecasting intertemporal changes in the dependent variables. However, if they are correlated with variables in the first group, they should also be included in the relation, to avoid bias in the estimates of the coefficients of the variables in the first group. For example, the size of the family is included as a variable in cross-section studies of Engel curves, even though it may have little to contribute in making intertemporal forecasts (Liviatan 1963; Prais & Houthakker 1955).
Sometimes the variables in the third group are not quantifiable but their attributes can be specified. They are then introduced into the analysis by grouping individuals according to the attributes and estimating separate relations for each group. For example, in studying consumption functions one of the variables in the third group might be “place of residence.” Individuals in a cross-section sample might then be grouped according to geographical areas. After fitting the relation to each group, one may test the hypothesis that individuals in all groups behave in the same manner, i.e., that the same relation holds for all the groups and that the so-called group effects are insignificant. Co-variance analysis provides the statistical framework for testing the equality of intercepts and the equalities of some or all of the slopes in the group relations. In such analysis, within-group variations of the variables (deviations of observations in the group from the group mean) are utilized, so at least two observations per group are required.
It may happen that the appropriate groups are identical to the units of the observations. For example, managerial ability is not quantifiable but must be allowed for in a cross-section study of production functions. However, the unit of observation may be the firm, and managerial ability probably differs for each firm. In that case, covariance analysis is impossible with just one cross section, since there is only one observation per group. If managerial ability is to be handled by the use of covariance analysis, repeated observations over time must be made on each firm. Since this calls for a combination of time series and cross-section data, some of the variables in the second group must also be included in the relation. If some of these variables are not directly quantifiable, their effects may be allowed for by introducing different intercepts and slopes for the various years in the sample (Mundlak 1963).
Analysis that is based merely on within-group variations of the variables ignores the between-group variations (deviations of the group means from the mean of all the observations), which are often much larger. The between-group variations can be utilized if the explanatory variables in the estimated relations are not correlated with the group effects. For example, if income is uncorrelated with “place of residence,” then the mean consumptions of the groups can be regressed on the mean incomes of the groups. This is particularly desirable when the variables are subject to unsystematic measurement errors (as is the case when testing the permanent income hypothesis), because averaging observations for each group will eliminate most of the measurement errors. [SeeLinear hypotheses, article onanalysis of variance.]
The problem of measurement errors is also handled in cross-section analysis by the use of instrumental variables [seeSimultaneous equation estimation]. For instance, in estimating the consumption function it is assumed that observed income measures permanent income with error. If this error is not serially correlated, the income of one year may be used as an instrumental variable for estimating the consumption function in another year. Note that this again calls for repeated observations over time on the incomes of the micro-economic units.
When variables in the relation to be estimated are jointly determined, the explanatory variables in the relation may not be independent of the disturbance term. For instance, factor inputs may not be independent of the disturbance term in a production function (Mundlak 1963; Walters 1963). In such cases estimating the relation by direct least squares will result in biased estimates of the coefficients. Various multiequation estimation procedures are available to overcome least-squares bias. However, they depend fundamentally on restrictions that may be satisfied only over time and not in a cross section. An exception is the instrumental variables method which, for example, uses lagged (or lead) factor inputs as instrumental variables in estimating a production function.
Special problems exist in estimating dynamic cross-section relations that involve adjustment processes [seeDistributed lags]. The empirical implications of many of the currently used adjustment models may be more applicable to group behavior than to individual behavior. While individuals facing uncertainty may not react instantaneously to changes in the variables on which they base their decisions, changes in their decisions may be discrete rather than continuous. It may be advantageous for them to make larger adjustments less often. However, the adjustment models employed in empirical work generally assume continuous adjustment. Since the frequency and size of adjustments may vary among individuals, continuous adjustment might be the result for the group. Here again, repeated observations over time on individuals can be utilized to surmount this difficulty.
Problems of application. The application of cross-section estimates of economic relations to intertemporal predictions of aggregates is subject to several difficulties. The cross-section estimates may depend on the values of the variables that are constant in the cross section but vary with time. Presumably this problem should be solved by a correct specification of the estimated relations, so that the variables that are stable in the cross section will not affect the estimates of the coefficients of the cross-section variables. For instance, income coefficients should be independent of prices, so that an estimated income coefficient will be applicable to periods with different prices. While such independence may nearly exist for income coefficients and prices, it may not exist in relations such as investment functions, where less regularity is the rule.
Furthermore, estimation of the coefficients of variables that are constant in a cross section but vary over time requires time series data. In estimating their coefficients by time series analysis, it is possible to use cross-section estimates of the coefficients of the variables in the first group in the time series relation. This of course can be done only if the cross-section estimates do not vary much from year to year. Income elasticities estimated from cross-section analysis are often grafted onto time series demand equations (Tobin 1950).
Finally, the transformation of estimates obtained for individuals to estimates applicable to markets is somewhat problematic. Aggregation over individuals is sensitive to the distribution of the explanatory variables among individuals and may lead to aggregate relations that differ in form from the individual relations (Houthakker 1955; Tobin 1950).
Yair Mundlak
BIBLIOGRAPHY
Bronfenbrenner, Martin; and Douglas, P. H. 1939 Cross-section Studies in the Cobb-Douglas Function. Journal of Political Economy 47:761–785.
Friedman, Milton 1957 A Theory of the Consumption Function. National Bureau of Economic Research, General Series, No. 63. Princeton Univ. Press.
Grunfeld, Yehuda; and Griliches, Z. 1960 Is Aggregation Necessarily Bad? Review of Economics and Statistics 42:1–13.
Houthakker, Hendrik S. 1955 The Pareto Distribution and the Cobb-Douglas Production Function in Activity Analysis. Review of Economic Studies 23, no. 1:27–31. Klein, Lawrence R. 1953 Textbook of Econometrics.
Evanston, III.: Row, Peterson. → See especially pages 211–241.
Kuh, Edwin 1963 Capital Stock Growth: A Micro-econometric Approach. Contributions to Economic Analysis, 32. Amsterdam: North-Holland Publishing.
Liviatan, Nissan 1963 Tests of the Permanent-income Hypothesis Based on a Reinterview Savings Survey. Pages 29–59 in Measurement in Economics: Studies in Mathematical Economics and Econometrics in Memory of Yehuda Grunfeld. Stanford Univ. Press. → A “Note” by Milton Friedman and a reply by Liviatan appear on pages 59–66.
Liviatan, Nissan 1964 Consumption Patterns in Israel. Jerusalem: Falk Project for Economic Research in Israel.
Michigan, University Of, Survey Research Center 1954 Contributions of Survey Methods to Economics. Edited by Lawrence R. Klein. New York: Columbia Univ. Press.
Mundlak, Yair 1963 Estimation of Production and Behavioral Functions From a Combination of Cross-section and Time-series Data. Pages 138–166 in Measurement in Economics: Studies in Mathematical Economics and Econometrics in Memory of Yehuda Grunfeld. Stanford Univ. Press.
Prais, S. J.; and Houthakker, H. S. 1955 The Analysis of Family Budgets With an Application to Two British Surveys Conducted in 1937–1939 and Their Detailed Results. Cambridge Univ. Press.
Staehle, Hans 1934–1935 Annual Survey of Statistical Information: Family Budgets. Econometrica 2:349–362; 3:106–118.
Stigler, George J. 1954 The Early History of Empirical Studies of Consumer Behavior. Journal of Political Economy 62:95–113.
Tobin, James 1950 A Statistical Demand Function for Food in the U.S.A. Journal of the Royal Statistical Society Series A 113, part 2:113–141.
Walters, Alan A. 1963 Production and Cost Functions: An Econometric Survey. Econometrica 31:1–66.