Markov Chains

views updated

Markov Chains

A Markov chain is a chance process having the special property that one can predict its future just as accurately from a knowledge of the present state of affairs as from a knowledge of the present together with the entire past history.

The theory of social mobility illustrates the idea. Considering only eldest sons, note the status of successive generations of a particular male line in a society divided into three classes: upper, middle, and lower. If we assume the movement of a family among the three social classes is a chance process—governed by probabilistic laws rather than deterministic ones—several possibilities present themselves. An independent process represents a perfectly mobile society: the probability that a man is in a particular class depends in no way on the class of his father. At the other extreme is a society in which the probability that a man is in a particular class depends on the class of his father, that of his grandfather, that of his great-grandfather, etc.

A Markov chain is a process of intermediate complexity: the probability that a man is in a given class may depend on the class of his father, but it does not further depend on the classes of his earlier antecedents. For instance, while upper-class and middle-class fathers may have different probabilities of producing sons of a given class, an upper-class father whose own father was also upper class must have the same probability of producing a son of a given class as does an upper-class father whose father was, say, lower class. (This example will be used to illustrate each concept introduced below.)

No chance process encountered in applications is truly independent—in particular, no society is perfectly mobile. While in the same way no natural process exactly satisfies the Markov chain condition, many of them come close enough to make a Markov chain model useful.

Formal definitions For an exact formulation of the Markov chain concept, introduced in 1907 by the Russian mathematician A. A. Markov, imagine a system (family, society, person, organism) that passes with each unit of time (minute, hour, generation) from one to another of the s states E₁ E₂,. .. E₃ (Upper class, E₁ middle class, E₂ and lower class, E₃, are the states in the social mobility example.) Assume that a chance process governs the evolution of the system; the chance process is the collection of probability laws describing the way in which the system changes with time. The system is that which undergoes change; a particular analysis may involve many systems of the same kind (many families, or many societies, etc.), all obeying the same process or set of probability laws.

Since the system passes through various states in sequence, time moves in jumps, rather than continuously; hence the integers 1, 2, 3,. .. provide a natural time index. Denote by P(E_k?E_i, E_j) the conditional probability that at time n + 2 the system is in state E_k, given that at times n and n + 1 it was in states E_i and E_j in that order; and similarly for longer or shorter conditioning sequences of states. We make the usual assumption that the conditional probabilities just defined do not depend on n (that is, the conditional probabilities do not change as time passes). The process is a Markov chain if P(E_k?E_i, E_j) = P(E_k?E_j), P(E_l?E_i,E_j,E_k) = P(E_l,E_k), P(E_m?E_i,E_j,E_k,E_l) = P(E_m?E_l), etc.

The transition probabilitie p_ij = P(E_j?E_i) determine the fundamental properties of the Markov chain; they form an s by s transition matrixP = (p_ij) the basic datum of the process. (A matrix such as P that has only nonnegative entries and has rows summing to 1 is called a stochastic matrix.)

An independent process can be considered a special sort of Markov chain for which p_ij = P_j (that is, the p_ij do not depend on i): the rows of P are all the same in this case. The second-order transition probabilities provide a second example of information contained in P . If the system is presently in state E_i, then the conditional probability that it will pass to E_j and then to E_k in the next two steps is P(E_j,?E_i)P(E_k?E_i,E_j) = P_ij P_jk hence the conditional probability that the system will occupy E_k two time units later is

(Summing over the index j accounts for all the possible intermediate states.) But this second-order transition probability is just the (j, k) th entry in p ² (the matrix P times itself in the sense of matrix multiplication).

Table 1
	E₁	E₂	E₃
E₁ (upper class)	.448	.484	.068
E₂ (middle class)	.054	.699	.247
E₃ (lower class)	.011	.503	.486

Social mobility in England and Wales is approximately described by a Markov chain with transition matrix shown in Table 1. (These numbers must, of course, be arrived at empirically. The problem of estimation of transition probabilities is discussed later.) For example, a middle-class man has chance .054 of producing a son (recall that only eldest sons are considered here) who enters the upper class. Now the chance that a man in the upper class has a middle-class grandson is =(.448 x .484) + (.484 x .699) + (.068 x .503)= .589, while the chance that a man in the middle class has a middle-class grandson is = .639.

Notice that these last two probabilities are different, which raises a point often misunderstood. The Markov chain definition prescribes that an upper-class man with upper-class antecedents must have the same chance of producing a middle-class son as an upper-class man with lower-class antecedents has. But influence—stochastic influence, so to speak—of a man on his grandson, which does exist if is entirely consistent with the definition, which requires only that the grandfather exert this influence exclusively through the intermediate generation (the father).

To describe natural phenomena by Markov chains requires idealization; the social mobility example carried through here makes this obvious. Since Markov chains allow for dependence, however, they can with satisfactory accuracy account for the evolution of diverse social, psychological, biological, and physical systems for which an independent chance process would make too crude a model. On the other hand, Markov chains have simple enough structure to be mathematically tractable.

As another example of an approximate Markov chain, consider sociological panel studies. A potential voter is asked his party preference each month for six months preceding an election; his answer places him in state E_l (Republican), E₂ (Democrat), or E₃ (undecided). The voter’s progress among these states (approximately) obeys the Markov rule if, in predicting his August preference on the basis of his previous ones, one can without (essential) loss disregard them all except the most recent one, namely that for July (and similarly for the other predictions). In learning theory, Markov models have proved fruitful for describing organisms that change state from trial to trial in response to reinforcement in a learning process. (See Bibliography for applications in such areas as industrial inspection, industrial mobility, sickness and accident statistics, and economics.)

Mathematical analysis The transition matrix P determines many of the characteristics of a Markov chain. We have seen that the (i, j) th entry of P ² is the conditional probability, given that the system is in E_i, that it will be in E_j two steps later. In the same way, the nth order transition probability the conditional probability that the system will occupy E_j after n steps if it is now in E_i is the (i,j) th entry in P ⁿ. But P alone does not determine the absolute probability that at time n the system is in state E_i. For this we need P and the initial probabilities .

The probability that the system occupies states E_i and E_j respectively at times n and n +1 is P_ij adding over the states possible at time n, we conclude that or, if a ⁽ⁿ⁾ denotes the row vector . More generally, the m th order transition probabilities link the absolute probabilities for time n + m to those for time n: or a ⁽ⁿ⁾p ^m = a ^(n+m). Thus the absolute probabilities are completely determined, via the relation a ⁽ⁿ⁾ = a ⁽¹⁾P ^n-1, by the transition probabilities p_ij and the initial probabilities In other words, P and a ⁽¹⁾ completely specify the probability laws of the Markov chain.

Chains with all p_ij > 0. The most important mathematical results about Markov chains concern the stability of the and the behavior of for large n. Although some of the p_ij may be zero, suppose they are all strictly greater than zero. (Later this restriction will be lifted and other chains considered.) One expects that for large n should not depend much on i—the effect of the initial state should wear off. This is indeed true: there exist positive numbers p₁,. . ., p₈ such that is close to P_j for large . It follows that approaches as n becomes large, so that the absolute probabilities stabilize near the p_i, no matter what the initial probabilities are.

The numbers p_i have several important mathematical properties, which in turn have probability interpretations. Clearly they sum to one: ?_ip_i = 1. Moreover, if n is large, is near ?_iP_ip_ij while is near p_j Since , we have ?_ip_ip_ij = p_j, or, with p = (p₁,. .. , p₈,), p P = p . Therefore the p_i solve the system

of s + 1 equations in s unknowns. Since it turns out that this system has but one solution, the limits p_i can be found by solving it, and this is the method used in practice. The vector p for the social mobility example is (.067, .624, .309). Whatever a man’s class, there is probability near .309 that a descendant a great many generations later is in the lower class.

The quantities p_j connect up with the absolute probabilities . If a ⁽ⁿ⁾ = p , then a ⁽ⁿ⁺¹⁾ = a ⁽ⁿ⁾P = pP = p , and similarly a ^(m) = p for all m beyond n. In other words, if the system has at a given time probability exactly p_j of being in state E_j (for each j) then the same holds ever after. For this reason, the P_j are called a set of stationary probabilities;since the solution to (1) is unique, they form the only such set.

Thus a system evolving according to the laws of the Markov chain has long-range probability approximately P_j of being in E_ij; and if the probability of being in E_j at a particular time is exactly p_j, this relationship is preserved in the future. The law of large numbers gives a complementary way of interpreting these facts. Imagine a large number of systems evolving independently of one another, each according to the laws of a common Markov chain. No matter what the initial states of the many systems may be, after a long time the numbers of systems in the various states will become approximately proportional to the p_j And once this state of affairs is achieved, it will persist, except for fluctuations that are proportionately small if the number of systems involved is large. In the social mobility example, the proportions in the three classes of a large number of family lines will eventually be nearly 067:624:309.

A system setting out from state E_i is certain to return again to E_i after one or more steps; the expected value of the number of steps until first re-turn turns out to be 1/p_i, a result that agrees qualitatively with intuition: the lower the probability of a state the longer the average time between successive passages through it. There exist also formulas for the expected number of steps to reach E_j for a system starting in E_i. In the example from mobility theory, the mean time to pass from the lower class to the upper class is 26.5 generations, while the mean for the return trip is but 5.6 generations (which should be a lesson to us all).

Other Markov chains. In the analysis sketched above, it was assumed that the p_ij were all strictly greater than zero. More generally, the same results hold even if some of the p_ij equal zero, provided the Markov chain has the property that the states all communicate (for each i and j there is an n for which > 0) and provided also that there is no periodicity (no integer m exceeding 1 exists such that a passage from E_i back to E_i is possible in n steps only if n is divisible by m). Such a chain is called ergodic. The results can also be reformulated to cover nonergodic chains.

The theory extends in still other useful ways. (a) A chance process is a Markov chain of second order if the probability distribution of future states depends only on the two most recently visited states: P(E_l?E_i,E_j,E_k) =P(E_i,E_j,E_k), P(E_m?E_i,E_j,E_k,E_l) = P(E_m?E_k,E_l), etc. These processes contain ordinary Markov chains as a special case and make possible a more precise description of some phenomena. The properties derived above carry over to chains of second (and higher) order, (b) If, for instance, the system is a population and the state is its size, there are then infinitely many possible states. The theory generalizes to cover such examples, but the mathematics becomes more difficult; for example (1) becomes a system of infinitely many equations in infinitely many unknowns, (c) Continuous time, rather than time that goes in jumps, is appropriate for the description of some systems—for example, populations that change state (size) because of births and deaths that can occur at any instant of time. Such processes in principle admit an approximate description within the discrete-time theory: one observes the system periodically (every year or minute or microsecond) and ignores the state occupied at other time points. However, since such an analysis is often unnatural, an extensive theory of continuous-time processes has grown up.

Statistical analysis One may believe a given process to be a Markov chain—or approximately a Markov chain—because of his knowledge of the underlying mechanism. Or he may hypothesize that it is a Markov chain, hoping to check this assumption against actual data. Under the Markov assumption, he may want to draw from actual data conclusions about the transition probabilities P_ij which are the basic parameters of the model. In other words, statistical problems of estimation and hypothesis testing arise for Markov chains just as they do for independent processes [seeEstimation; Hypothesis Testing].

Suppose one observes n systems (governed by a common Markov chain), of which n_i start in state E_i, and follows each of them through one transition. If n_ijn is the number that step from E_i to E_j so that ?_jn_ij = n_i then the log-likelihood function is ?_l.jn_ij log p_ij ; maximizing this function with the s constraints ?_jp_ij = 1 gives the natural ratios n_ij/n_i as the maximum likelihood estimators of the P_ij The numerical matrix given for the social mobility example was obtained in this way from a sample of some 3,500 father-son pairs in England and Wales. If the n_i, are large, the estimators are approximately normally distributed about their mean values p_ij. By seeing how far the n_ij/n_i, differ from putative transition probabilities P_ij, one can test the null hypothesis that these pi; are the true parameter values. The statistic appropriate for this problem is ?_i.j(n_ij – n_ip_ij)²/n_ip_ij; it has approximately a chi-square distribution with s (s – 1) degrees of freedom if the n_i are large.

Other tests are possible if the systems under observation are traced for more than one step. Suppose n_ijk is the number of the systems that start in E_i, step to E_j, and then step to E_k. If the process is a Markov chain, then (a) the chance that the second step carries the system from E_j to E_k does not depend on the initial state E_i and (b) the transition probabilities for the second step are the same as those for the first.

Let a dot indicate an index that has been summed out; for example, n_.ij = ?_kn_kij is the number of systems that are in states E_i and E_j at times 2 and 3, respectively (with the state at time 1 completely unspecified). Assuming (a), one can test (b) by comparing the estimators ni}. /n,.. of the transition probabilities for the first step with the estimators n_ij./n_i. . of the transition probabilities for the second step. And one can test (a) itself by comparing the ratios n_ijk/n_.i. (the estimated second-step transition probabilities) with the ratios n_ijk/n_.jk (the estimated second-step transition probabilities, allowing for possible further influence of the initial state); if (a) holds,n_ijk/n_ij. should, independently of i, be near n_.jk/n_.j.; thus one can check on the Markov chain assumption itself.

This statistical analysis, based on following a large number of independently evolving systems through a small number of steps, really falls under the classical chi-square and maximum likelihood theory. There is also a statistical theory for the opposite case, following a small number of systems (perhaps just one) through many steps. Although the estimates and tests for this case have forms similar to those for the first case, the derivation of their asymptotic properties is more involved.

The statistical analysis of Markov chains may be extended in various directions. For example, a mode of analysis in which times of stay in the states are considered, together with transition probabilities, is discussed by Weiss and Zelen (1965).

Patrick Billingsley

[Other relevant material may be found in Panelstudies; Social Mobility.]

BIBLIOGRAPHY

Anderson, T. W.; and Goodman, Leo A. 1957 Statistical Inference About Markov Chains. Annals of Mathematical Statistics 28:89-110. ? A detailed treatment of the problem of many short samples.

Billingsley, Patrick 1961 Statistical Methods in Markov Chains. Annals of Mathematical Statistics 32: 12-40. ? A review paper covering the problem of one long sample from a Markov chain. Contains a large bibliography.

Blumen, Isadore; Kogan, Marvin; and Mccarthy, Philip 1955 The Industrial Mobility of Labor as a Probability Process. Cornell Studies in Industrial and Labor Relations, Vol. 6. Ithaca, N.Y.: Cornell Univ. Press. ? A detailed empirical study of mobility problems.

Feller, William 1950–1966 An Introduction to Probability Theory and Its Applications. 2 vols. New York: Wiley. ? A classic text covering continuous-time processes as well as chains with infinitely many states. Excellent examples. The second edition of the first volume was published in 1957.

Glass, D. V.; and Hall, J. R. 1954 Social Mobility in Great Britain: A Study in Inter-generation Changes in Status. In D. V. Glass (editor), Social Mobility in Britain. London: Routledge. ? This is the source of the original data for the social mobility example.

Goodman, Leo A. 1961 Statistical Methods for the Mover–Stayer Model. Journal of the American Statistical Association 56:841–868.

Goodman, Leo A. 1962 Statistical Methods for Analyzing Processes of Change. American Journal of Sociology 68:57-78. ? This and the preceding paper treat modifications of the Markov model for mobility problems.

Jaffe, Joseph; Cassotta, Louis; and Feldstein, Stanley 1964 Markovian Model of Time Patterns of Speech. Science 144:884–886.

Kemeny, John G.; and Snell, J. Laurie 1960 Finite Markov Chains. Princeton, N.J.: Van Nostrand. ? Contains both theory and examples. This is the source of the figures for the social mobility example.

Prais, S. J. 1955 Measuring Social Mobility. Journal of the Royal Statistical Society Series A 118:56-66. ? This is the source of the analysis of the social mobility example.

Weiss, George H.; and Zelen, Marvin 1965 A Semi-Markov Model for Clinical Trials. Journal of Applied Probability 2:269–285.

International Encyclopedia of the Social Sciences