Principal Components
Principal Components
Quantitative social science often involves measurements of several variables on a number of individuals. Principal components are variates, linear combinations of variables that have special properties in terms of variances. Variates having large variance across the population are of special interest, in that they best distinguish among individuals in it.
It is often possible to understand or interpret a set of variables through a few variates. Also, sometimes relationships are not exhibited in the observed variables but rather in combinations of them; principal components can reveal such relationships.
Variates are expressions such as 6X + 8Y or 2X + 3Y + 4Z. The sum and difference are special cases. For example, instead of a verbal IQ, V, and a quantitative IQ, Q, one might study their sum V + Q and their difference V–Q. If it is known that the sum is 260 and the difference is 20, one can see that V is 140 and Q is 120. If the original variables are replaced by the same number of variates, all the information in the original variables is retained; the original values could be recovered.
It is useful to transform to uncorrelated variates, containing separate information. This can be done in many ways, so the transformation can be chosen to have further desirable properties. In principal components analysis, first the variate having the largest possible variance is found. In doing this, the sizes of the coefficients (multipliers of the variables) must be controlled. For example, the variance of 6X + 8Y is 100 times that of 0.6X + 0.8Y. Only normalized variates, like the latter, whose coefficients have a sum of squares of one, are considered in defining principal components.
The first principal component is the normalized variate having maximum variance. The second principal component is the normalized variate having maximal variance among those uncorrelated with the first. The third principal component is that having maximal variance among those uncorrelated with the first two, and so on. In short, then, the principal components are uncorrelated linear combinations of maximal variance.
Principal components are computed from variances and covariances that depend upon the units of measurement, so often the data are first converted to z-scores, each value being replaced by the number of standard deviations above or below the mean for that variable. The resulting variables have variances equal to 1. Sometimes the original data are used, especially when the variables have the same units of measurement, for example when they are measurements of several dimensions of the same object. The operations obtaining principal components are performed on the covariance matrix. Use of z scores is equivalent to using the correlation matrix rather than the covariance matrix.
Principal component analysis approximates the variables using a small number of variates. In exploratory studies involving many variables, some may be dropped later if they have small coefficients in the first several principal components. To obtain a two-dimensional scatterplot of a multivariate sample, the first and second principal components are used.
Principal components can be inputs to another procedure, such as multiple regression or cluster analysis. Factor analysis describes each variable as having a part that is explained by factors shared with other variables and a unique part that is not so explained; it analyzes the shared variance of the set of variables. Once a reduction to a smaller number of factors or principal components is made, often the pattern of coefficients leads to an interpretation of them. Here caution must be exercised. Since there are many mathematically equivalent solutions, such an interpretation is not unique; it is an interpretation, not the interpretation.
Harold Hotelling (1895–1973), an American economist and statistician, developed many of the ideas of principal components.
SEE ALSO Cluster Analysis; Factor Analysis; Regression Analysis
BIBLIOGRAPHY
Hotelling, Harold. 1933. Analysis of a Complex of Statistical Variables into Principal Components. Journal of Educational Psychology 24: 417–441, 498–520.
Stanley L. Sclove