Latent Structure

views updated

Latent Structure

The latent class model

Parameter estimation

BIBLIOGRAPHY

A scientist is often interested in quantities that are not directly observable but can be investigated only via observable quantities that are probabilistically connected with those of real interest. Latent structure models relate to one such situation in which the observable or manifest quantities are multivariate multinomial observations, for example, answers by a subject or respondent to dichotomous or trichotomous questions. Models relating polytomous observable variables to unobservable or latent variables go back rather far; some early references are Cournot (1838), Weinberg (1902), Benini (1928), and deMeo (1934). These models typically express the multivariate distribution of the observable variables as a mixture of multivariate distributions, where the distribution of the latent variable is the mixing distribution [seeDistributions, Statistical, article on Mixtures of Distributions].

Lazarsfeld (1950) first introduced the term latent structure model for those models in which the variables distributed according to any of the component multivariate distributions of the mixture are assumed to be stochastically independent. (Thus a latent structure model of a subject’s answers to 50 dichotomous questions—the latent class model of this article—assumes that subjects fall into relatively few classes, called latent classes, with the variable that relates the subject to his class being the latent variable. The distribution of this latent variable, that is, the distribution of the subjects among latent classes, is the mixing distribution. Within each class it is assumed that the responses to the 50 dichotomous questions are stochastically independent.) A basic reference for the general form of latent structure models is Anderson (1959).

The present article—restricted to the case of dichotomous questions—emphasizes the problems of identifiability and efficient statistical estimation of the parameters of latent structure models, points out difficulties with methods that have been proposed, and summarizes doubts currently held about the possibility of good estimation.

The simplest of the latent structure models and almost the only one in which the problem of parameter estimation has been carefully addressed is the latent class model. In this model, each observation in the sample is a vector x with p two-valued items or coordinates, conveniently coded by writing each either as 0 or as 1. The latent class model postulates that there is a small number m of classes, called latent classes, into which potential observations on the population can be classified such that within each class the p coordinates of the vector x are statistically independent. This is not to say that all identical observations in the sample are automatically considered as coming from the same class. Rather, associated with each class is a probability distribution on the 2P possible vectors x, such that the p coordinates of x are (conditionally) independent. An observation vector x thus has a probability distribution that is a mixture of the probability distributions of x associated with each of the latent classes.

An example of the above model comes from the study (Lazarsfeld 1950) of the degree of ethnocentrism of American soldiers during World War II. Because it is not known how to measure ethnocentrism directly, a sample of soldiers was asked the following three questions: Do you believe that our European allies are much superior to us in strategy and fighting morale? Do you believe that the majority of all equipment used by all the allies comes from American lend-lease shipment? Do you believe that neither we nor our allies could win the war if we didn’t have each other’s help?

Here p = 3 and x is the vector of responses to the three questions, with Yes coded as 1 and No coded as 0. A suitable latent class model would postulate that there are two latent classes (so that m = 2), such that within each class the answers to the three questions are stochastically independent. Postulating the existence of any more than two latent classes would, as will be seen later, lead to difficulties, since the parameters of such a latent class model could not be consistently estimated. The two latent classes would probably be composed of ethnocentric and nonethnocentric soldiers, respectively. However, this need not be the case, and in fact it may happen that the two latent classes will have no reasonable interpretation, let alone the hoped-for interpretation. This phenomenon of possible noninterpretability is characteristic not only of the latent class model but also of the factor analysis and other mixture-of-distributions models.

The latent class model

Let σ denote a subset (unordered) of the integers (1,2, • • •, p), possibly the null subset ø. (Other subsets will, for concreteness, be denoted by writing their members in customary numerical order.) Let πσ denote the probability that for a randomly chosen individual each coordinate of x with index a member of or is a 1, and define πø = 1. For example, π2,7,19 is the probability that the second, seventh, and nineteenth coordinates of x are all 1, forgetting about—or marginally with respect to—the values of the other coordinates of x.

Since the order of coordinates is immaterial for such a probability, one is justified in dealing with the 2η unordered σ’s, but a specific order in naming the subset is helpful for exposition. The πσ’S are notationally a more convenient set of parameters than what might be considered the 2p natural parameters of the multinomial distribution of x.

A concise description of the natural parameters of the distribution of x is the following. Let ̄σ denote that subset of the integers (1,2, • • •,p) which is the complement of σ. Let ππ: ̄ σ denote the probability that for a randomly chosen individual each coordinate of x with index of a member of σ is a 1 and each coordinate of x with index a member of σ is a 0. The 2p π̄σ’S are the natural parameters of the multinomial distribution of x, since they are the probabilities of each of the 2 p possible observation values. For example, in the ethnocentrism case, π1.2:3 would be the probability that the first two questions are answered Yes, while the third question is answered No. The πσ’S and πσ:̄σ’S are related by a nonsingular linear transformation.

Let va be the probability that the observation vector x is a member of the αth latent class, where α = 1,2, • • • , m and Σvα = 1. Let λασ be the prob-ability that if x is a vector chosen at random from the αth class, then each coordinate of σ with index a member of x is a 1. Clearly πσ = Σαvα λασ.

Let σi denote the ith member of σ, with the members of σ arranged in some order, say numerical. The fundamental independence assumption of the latent class model then says that for each α

for all σ. That is, the probability (conditional on x being in the αth latent class) of any given set of coordinates of x being all 1’s is the product of the probabilities of each of these coordinates being a 1. Then

for all σ. These equations are called the accounting equations of the latent class model. Thus the m (p + 1) parameters of the model are the latent pa rameters λαi and the vα, α = 1, • • •, m, i = 1, • • •, p. These completely determine the 2Pmanifest parameters, the πσ, via the accounting equations.

Parameter estimation

Suppose that the number of latent classes, m, is known to the investigator. (This assumption is made because it underlies all the theoretical work on the estimation of parameters of the latent class model. In practice m is unknown, but a pragmatic approach is to assume a particular small value of m, proceed with the estimation, see how well the estimated model fits the manifest data, and alter m and begin again if the fit is poor.) Then a central statistical problem is that of estimating the parameters of the model, the ν’s and λ’s, from a random sample of n vectors x. (The typical sample in survey work is a stratified rather than a simple random sample. However, the problem of estimating latent parameters from such samples is much more complicated, and as yet has hardly been touched.)

Let nσ be the number of vectors in the sample with 1’s in each component whose index is a member of σ and let pσ = nσ/n. If the model were simply a multinomial model with parameters the σσ’S, then the Pσ’S would be maximum likelihood estimators of the σσ’S,. If for each set of 2p σσ’S, there is a unique set of latent parameters, vα’S and λαi’S, α = 1, • • • ,m, i = 1, •• • , p, then the η’S and λ’s are functions of the πσ’S, and evaluating these functions at the pσ’s as arguments will yield estimators (actually consistent estimators) of the latent parameters. But the “if” in the last sentence is most critical; it ..., the identifi ability condition, common to all models relating distributions of observable random variables to distributions of unobservable random variables. Consequently, most of the work on parameter estimation in latent class analysis is really a by-product of work on finding constructive procedures, that is, procedures that explicitly derive the unique latent parameters as function of the π’S, for proving the identifiability of a latent class model associated with a given m and p. With such a constructive procedure available, one can replace the π’S by their estimates, the p’s, and use the procedure to determine estimates of the νs and λ’s. The following description of estimation procedures based on constructive proofs of identifiability will thus really be a description of the constructive procedure for determining the v’s and λ’s from a subset of the π’S.

Green’s method of estimation. The earliest constructive procedure was given by Green (1951). Let D i be the m x m diagonal matrix with λαi, α = 1, • • •, m, on the diagonal, and let L be the (p + 1) x m matrix with first row a vector of 1’s and jth row (j = 2, • • • ,p + 1) the vector of (λ1, j-1, • • • λm,j-1). Let N be the m x m diagonal matrix with να = 1, • • • , m, on the diagonal. For σ a subset of (1, 2, • • • ,p), define D σ = Πσj∊σD σj. Form the matrix Πσ = LNDσL’, where the prime denotes the matrix transpose. The (i,j)th element of this matrix is

If i≠ j and i, j ∉ σ then the (i, j)th element of this matrix is the manifest parameter πi jσ. Otherwise the (i,j)th element of this matrix can formally be defined as a quantity called πi j σ, where the subscript of π may have repeated elements. Since π’S with repeated subscripts are not manifest parameters and have no empirical counterpart but are merely formal constructs based on the latent parameters, they are not estimable directly from the nσ’s. However, Green provided some rules for guessing at values of these π’S(one rule is given below) so that the matrix πσ can be partly estimated and partly guessed at, given data.

Let be the m × m diagonal matrix with α=l, • • • ,m, on the diagonal, and Then Π =ΣkΠk = ADA’ . Under the assumptions that mp + 1, rank A = m, and all the diagonal elements of D are different and nonzero, the following procedure determines the matrices L and N of latent parameters.

Factor Π0 as Π0 = BB’ and Π as Π = CC’ . (The matrices B and C are not unique, but any factorization will do.) Let T = (BB’ )-1B’C . A complete principal component analysis of TT’ will yield an orthogonal matrix Q, and it can be shown that A =BQ . Since the first row of L is a vector of 1’s, the first row of A is an estimate of the vector so that N is easily determined. The matrix L is then just .

The major shortcoming of this procedure is the problem of how to guess at values of the π’S bearing repeated subscripts. No one has yet devised a rule which, when applied to a set of p’s, will yield consistent estimators of L and N . For example, Green suggests using as a guess at πiiH. Yet in the case m = 2, p = 3 with latent parameters v1 = v2 = .5, λ11 = .9, λ12 = .2, λ13 = .8, λ21 = .7, λ22 = .9, λ23 = .4, if i = 2, maxj≠2(p2j— p2pj) is a consistent estimator of — .07, so that p22 is a consistent estimator of something smaller than But , so that p22 is not a consistent estimator of π22

Determinantal method of estimation. A matricial procedure that does not have the above shortcoming, since it involves only estimable π’S, was first suggested by Lazarsfeld and Dudman (see Lazarsfeld 1951) and independently by Koopmans (1951), developed by Anderson (1954), and extended by Gibson (1955; 1962) and Madansky (1960). For ease of exposition, the procedure will be described only for the cases treated by Anderson.

Assume that p ≥ 2 m + 1. In that case, 2 m + 1 different items can be selected from the p items (say, the first 2 m +1) and the following matrices of π’S involving only these items formed. Let

and let ͂Π be the matrix Π* with the 1 replaced by π2 m+1 and all the π’S having the additional subscript 2 m + 1. Let A: be an (m + 1) × (m + 1) matrix with the first row a vector of 1’s and the jth row (j = 2, ... ,m + 1) the vector (λ1,j-1, ... , λm, j-1), and let Λ2 be an (m+l)×(m+l) matrix with first row a vector of 1’s and the jth row(j = 2, ... ,m + 1) the vector (λ1m+j-1, ... , λm,m+j-1). Let N and D 2m+1 be defined as above. Then and . Thus, if the diagonal elements of D 2 m+1 are distinct and if Λ 1, N and Λ, are of full rank, then the diagonal elements of D 2 m+1 are the roots θ of the determinantal equation ͂Π – |θΠ*| = 0.

Table 1
ParameterValueAsymptotic variance
v3/41115.42/n
λ111/239.00/n
λ121/360.89/n
λ131/34.96/n
λ211/4303.00/n
λ222/3611.53/n
λ231/431.00/n

If Z is the matrix of characteristic vectors corresponding to the roots θl , ... ,θm , then the columns of ΠZ are proportional to the columns of Λ1, with the constant of proportionality determined by the condition that the first row of Λ1 is a vector of 1’s. A similar argument using the transposes of ͂Π and Π* yields Λ 2 , and N is determined by

A difficulty with this procedure is that it depends critically on which 2 m + 1 items are chosen from the p items, on which of these 2 m + 1 is chosen to define Π*, and on the allocation of the 2 m items to the rows and columns defining Π*. That is, it de-pends critically on the ordering of the items. There are no general rules available for an ordering of the items that will yield relatively efficient estimators of the latent parameters.

The most important shortcoming of this procedure and of its extensions (which involve more of the π’S) is that there is no guarantee that when the procedure is used with a set of p’s it will produce permissible estimates of the latent parameters, that is, estimates that are real numbers between 0 and 1. In four sampling experiments with n – 1,000, m - 3, and p - 8, Anderson and Carleton (1957) found that of 2,240 determinantal equations only 33.7 per cent had all roots between 0 and 1. Madansky (1959) computed the asymptotic variance of the determinantal estimates for the case m = 2,p - 3, a case in which these estimators, if permissible, are the maximum likelihood estimators of the latent parameters, and found the results presented in Table 1, where n is the sample size. Thus, sample sizes must be greater than 1,116 for the variance of the estimators of all the parameters to be less than 1.

Table 2
π123:ø = 10/192
π23:1 = 14/192
π13:2 = 17/192
π12:3 = 22/192
π3:12 = 19/192
π2:13 = 34/192
π1:23 = 35/192
πø;123 = 41/192
Table 3
Response patternNumber observed
123;ø/b>2
23;13
13;24
12;34
3;124
2;137
1;237
ø1239

Rounding error also affects the estimates greatly. The parameters of the multinomial distribution for the above model are given in Table 2.

For a sample of size 40, if one had actually ob-served the expected number of respondents for each of the response patterns (rounded to the nearest integer), then the sample would have the composition shown in Table 3. Table 4 shows the pσ’s based on these data (πσ being given for comparison). The determinantal estimates of the latent parameters are given in the third column of Table 5. (The fourth column will be discussed below.)

Partitioning method of estimation. A third estimation procedure (Madansky 1959) looks at the problem in a different light. Since the latent classes are defined as those classes within which the p components of the vector x are statistically independent, one might (at least conceptually) look at all possible assignments of the n observations into m classes and find that assignment for which the usual x2 test statistic for independence is smallest. The estimates of the latent parameters would then just be the appropriate proportions based on this best assignment. They would always be permissible. Although for finite samples they would not be identical with minimum x2 estimates, they would have the same asymptotic properties and thus be asymptotically equivalent to maximum likelihood estimates.

Madansky (1959) introduced another measure of independence, simpler to compute than x2, and found that the asymptotic efficiency of the estimators of the latent parameters from this procedure, in the example described above, is about .91. The obvious shortcoming of this idea is that it is too time consuming to carry out all the possible assign

Table 4
σPσσσ
1.425.4375
2.400.4167
3.325.3125
12.150.1667
13.150.1406
23.125.1250
123.050.0521

ments, even for moderate samples on an electronic computer. In the example described above, for a sample of size 40 it took four hours of computation on the IBM 704 to enumerate and assess all the assignments into two classes. The resulting estimates are shown in the fourth column of Table 5.

Table 5 – Parameter estimates for two methods*
ParameterValueDeterminantal estimatePartitioning estimate
n = 40.
Source: Madansky 1959, p. 21.
v1.75.23.58
λ11.50.82.00
λ12.33.23.43
λ13.33.42.30
λ21.25.301.00
λ22.67.45.35
λ23.25.29.35

Scoring methods. Current activity on estimation procedures for the latent class model (Henry 1964) is directed toward writing computer routines using the scoring procedure described by McHugh (1956) to obtain best asymptotically normal estimates of the latent parameters. The scoring procedure will yield estimators with the same large asymptotic variances as those indicated by the above example of the maximum likelihood estimators’ asymptotic variances. Also, the scoring procedure has the same permissibility problem associated with it as did the determinantal approach described above. However, the problem can be alleviated for this procedure by using a set of consistent permissible estimators for initial values in the scoring procedure.

Albert Madansky

[See alsoScaling. Directly related are the entriesDistributions, statistical, article onmixtures of distributions; Factor analysis; Statistical identifiability.]

BIBLIOGRAPHY

Anderson, T. W. 1954 On Estimation of Parameters in Latent Structure Analysis. Psychometrika 19:1-10.

Anderson, T. W. 1959 Some Scaling Models and Estimation Procedures in the Latent Class Model. Pages 9–38 in Ulf Grenander (editor), Probability and Statistics. New York: Wiley.

Anderson, T. W.; and CARLETON, R. O. 1957 Sampling Theory and Sampling Experience in Latent Structure Analysis. Journal of the American Statistical Association 52:363 only.

Benini, Rodolfo 1928 Gruppi chiusi e gruppi aperti in alcuni fatti collettivi di combinazioni. International Statistical Institute,Bulletin 23, no. 2:362-383.

Cournot, A. A. 1838 Mémoire sur les applications du calcul des chances à la statistique judiciaire.Journal de mathematiques pures et appliquees 3:257-334.

Demeo, G. 1934 Su di alcuni indici atti a misurare I’attrazione matrimoniale in classificazioni dicotome. Accademia delle Scienze Fisiche e Matematiche, Naples,Rendiconto 73:62–77.

Gibson, W. A. 1955 An Extension of Anderson’s Solution for the Latent Structure Equations. Psychometrika 20:69–73.

Gibson, W. A. 1962 Extending Latent Class Solutions to Other Variables.Psychometrika 27:73-81.

Green, BERT F. JR. 1951 A General Solution for the Latent Class Model of Latent Structure Analysis.Psychometrika 16:151-166.

Henry, Neil 1964 The Computation of Efficient Estimates in Latent Class Analysis. Unpublished manuscript, Columbia Univ., Bureau of Applied Social Research.

Koopmans, T. C. 1951 Identification Problems in Latent Structure Analysis. Cowles Commission Discussion Paper: Statistics, No. 360. Unpublished manuscript.

Lazarsfeld, Paul F. 1950 The Logical and Mathematical Foundation of Latent Structure Analysis. Pages 362–412 in Samuel A. Stouffer et al.,Measurement and Prediction. Princeton Univ. Press.

Lazarsfeld, Paul F. 1951 The Use of Mathematical Models in the Measurement of Attitudes. Research Memorandum RM-455. Santa Monica (Calif.): RAND Corporation.

Lazarsfeld, Paul F. 1959 Latent Structure Analysis. Pages 476–543 in Sigmund Koch (editor),Psychology: A Study of a Science. Volume 3: Formulations of the Person and the Social Context. New York: McGraw-Hill.

Mchugh, Richard B. 1956 Efficient Estimation and Local Identification in Latent Class Analysis.Psychometrika 21:331-347.

Mchugh, Richard B. 1958 Note on “Efficient Estimation....” Psychometrika 23:273-274. → This is a correction to McHugh 1956.

Madansky, Albert 1959 Partitioning Methods in Latent Class Analysis. Paper P-1644. Santa Monica (Calif.): RAND Corporation.

Madansky, Albert 1960 Determinantal Methods in La-tent Class Analysis.Psychometrika 25:183–198.

Weinberg, Wilhelm 1902 Beitrage zur Physiologie und Pathologic der Mehrlingsgeburten beim Menschen. Pftuger’s Archiv fur die gesamte Physiologie des Menschen und der Tiere 88:346–430.