Covariance
Covariance
The covariance is a measure of the magnitude of association between the scores of cases on two variables that have been measured at the interval or ratio level. It describes both the direction and the strength of the association. In
the social sciences, the covariance is most commonly used in structural equation modeling of systems of linear equations of measured and unmeasured variables.
Formally, the covariance between the scores of c cases (i through N ) on the variables X and Y is:
That is: subtract the first case’s score on X from the mean of X; subtract the first case’s score on Y from the mean of Y; multiply these “deviations.” Repeat this process for all of the cases, and sum the results. Divide this product by the population size (N).
When the relationship between X and Y is being examined in a random sample of cases drawn from the population, N -1 is usually substituted in the denominator. Most statistical software uses N -1.
The covariance of a variable with itself (e.g., COV (X, X)) is the variance. (For a more in-depth formal treatment of the covariance, see Snedecor and Cochran 1980).
If there is a tendency for higher scores on X to cooccur with higher scores on Y, the covariance will have a positive value; if there is a tendency for higher scores on X to co-occur with lower scores on Y, the covariance will be negative. If the scores on two variables are not associated, the covariance will equal zero. The units of measurement of the covariance are XY; for example, if X was measured in dollars, and Y was measured in years, the magnitude of the covariance would be dollar-years. When we are working with multiple variables, the variances and covariances among all the variables are arrayed in a symmetric “variance-covariance” matrix.
Consider the relationship shown in the scatter-plot, between the level of urbanization (X) and female life expectancy (Y) in nineteen African countries in the mid-1990s. Inspection suggests that the scores positively covary: on the average (but not in all cases), the higher the urbanization, the higher the life expectancy.
The covariance for this relationship is 57.538. The positive value indicates a positive relationship. The strength of the relationship is difficult to assess because the unit of measurement of the covariance is percent-years. Because of this peculiar metric, the covariance is rarely used as a simple description. The Pearson correlation (which is. 47 in this example) is preferred.
The covariance is the most commonly used measure of association when research involves predicting Y from X using structural equation modeling. In predictive modeling, there is often the desire to describe the relationship between Y and X in the original scales of the variables: How much Y do we get for each unit of X ?
Some warnings: Restricted variation in either variable, non-linearity in the relationship, and non-normality in the joint distribution of X and Y can limit the validity of the covariance as an index of the strength and direction of the relationship.
SEE ALSO Standard Deviation
BIBLIOGRAPHY
Snedecor, George W., and William Gemmell Cochran. 1980. Statistical Methods. 7th ed. Ames: Iowa State Press.
Robert Hanneman
covariance
The analysis of covariance is an extension of the analysis of variance in which the variables to be tested are adjusted to take account of assumed linear relationships with other variables. See also correlation.