Censoring, Sample
Censoring, Sample
In order to define censoring in sample data, researchers first distinguish between latent data, the data that are hidden from the observer, and observed data, which are their measured counterparts. The latent data are modified between the original data creation and the observation by the analyst. Censoring occurs when certain values in the latent data are transformed so that their identities in the original data are masked or hidden from the observer.
A natural, familiar case is institutional censoring. Certain government statistics, such as income data on individuals and line of business data on firms, are censored to mask the identities of the persons or businesses. Thus income might be reported not as their original values, but only as being in a certain bracket. Censoring also occurs naturally in the way that certain data are observed. A leading example is the observations on durations in medical statistics. Observed data on the longevity after the surgery of heart transplant patients, or the length of survival after onset of a disease, are naturally censored if the individual leaves the observation setting before the transition takes place. Thus the hospital may at some point lose contact with the heart transplant patient. The observation consists of the knowledge that the patient was still alive at the time of his or her exit from the study, but not how long he or she survived. In another familiar case, the true levels of demand for sporting and entertainment events are not revealed by ticket sales because the venue may sell out. The observed reflection of the demand is only ticket sales, limited by the capacity of the venue. Some economic phenomena, known as corner solutions, also lead to censoring when the observed counterpart to a variable of interest has a boundary value. Thus the amount of insurance an individual desires may be censored at zero. The amount of investment that a business undertakes might be recorded as zero if the assets of the business are allowed to depreciate, such that the true investment is actually negative.
IMPLICATIONS FOR MODELING
Models of statistical phenomena usually describe relationships between, or co-movements of, variables. Censoring interferes with this sort of modeling. (See Greene 2003 and Maddala 1983 for analysis and extensions.) Suppose that the relationship occurs between the latent variables of interest. If a variable x* is expected to explain the movement of variable y, and x* is censored to reveal x, then the analyst will measure movements in y that are associated with movements in x* when x does not change. In the opposite case, when it is y* that is censored and x that is not, the analyst will observe movement in x that should be associated with movement in y, but is not. Either case leads to a distortion in the measured relationship between a censored variable and an uncensored one. Contemporary model builders accommodate this type of distortion by building specific models for the censoring process along with the relationship of interest. Controversy arises over the many assumptions that must be made in order to make reasonable analysis feasible. Ultimately, the estimated relationship, such as that between prices and ticket sales for sporting events, is also a function about what is assumed about the underlying process whereby true underlying data are translated into observed, censored data, for example, how true demand is translated into ticket sales.
SEE ALSO Censoring, Left and Right
BIBLIOGRAPHY
Greene, William H. 2003. Econometric Analysis. 5th ed. Upper Saddle River, NJ: Prentice Hall, 2003.
Maddala, G. S. 1983. Limited-Dependent and Qualitative Variables in Econometrics. Cambridge, U.K.: Cambridge University Press.
William Greene
