Central Limit Theorem

views updated Jun 08 2018

Central Limit Theorem

The central limit theorem (CLT) is a fundamental result from statistics. It states that the sum of a large number of independent identically distributed (iid) random variables will tend to be distributed according to the normal distribution. A first version of the CLT was proved by the English mathematician Abraham de Moivre (1667– 1754). He showed how the normal distribution can be used to approximate the distribution of the number of heads that will result when a coin is tossed a large number of times.

The CLT is the cornerstone of most estimation and inference of statistical models, which in turn are widely used in empirical work in the social sciences. Statistical models involve unknown population parameters that are estimated from a sample. The estimators often take the form of sample averages. According to the CLT, the estimators will therefore be approximately normally distributed for a sufficiently large sample size. This result can be used to draw inference about the population parameters. One example of a statistical model used in social sciences is the linear regression model. Here, the CLT can be used to quantify whether a chosen set of variables explains the variation in a certain response variable.

THE THEOREM

Let {x₁, …, x_n } be a sample of n iid random variables with mean μ and variance σ² Consider the sum S_n = x₁ + x₂ + … + x_n. One may easily check that the mean and standard deviation of S_n is nμ and . Normalize S_n as follows,

such that Z_n has mean zero and standard deviation 1. The CLT then states that Z_n ≈ N (0,1) for n large enough. Formally the above equation should be read as follows: For any –∞ < z < +∞, P (Z_n ≤ z ) → Ф (z ) as n → ∞, where Φ (·) is the cumulative density function of the normal distribution.

A major drawback of the CLT is that it is silent about how large n should be before the quality of the approximation is good. This wll depend on the distribution of the x_i’s making up the sum.

APPLICATIONS

The CLT has a broad range of applications. Consider, for example, a binomial random variable S_n with parameters (n,p ). This variable describes the number of heads in n tosses of a coin with probability 0 < p< 1 of heads. Its distribution is given by

For n large, this distribution can be difficult to compute. Another way of representing S_n is as a sum of n iid Bernoulli random variables {x₁, …, x_n}. That is, S_n = x₁ + x₂ + … + x_n where the distribution of x_i is P (x_i = 1) = 1 – P (x_i = 0) = p, i = 1, …, n. So we can apply the CLT on S_n, which tells us that S_n ≈ N (np, np (1 – p )) for n large enough since μ = E[x_i ] = p and σ²=Var(x_i)=P(1-p). This result was first proved by de Moivre in 1733.

The most important use of the CLT is probably in drawng inference about population parameters in statistical models. Most estimators of parameters can be written as sums of the sample, and so the CLT can be used to obtain a measure of the precision of the estimator. In particular, it can be used to test hypotheses regarding the parameters. As a simple example, consider an iid sample {x₁, …, x_n}with unknown population mean μ and variance σ². A simple estimator of the parameter μ is the sample average,

We can now use the CLT to conclude that

Since the variance is unknown, it needs to be estimated. This can be done using the sample variance,

One can now use the normal approximation for inferential purposes. For example, we can estimate the standard error of x ̄ as Also, we know that with approximately 95 percent probability, where 1.96 is the 97.5th percentile of the normal distribution; one normally refers to this as the confidence interval. The CLT can furthermore be used to test specific hypotheses regarding μ

SEE ALSO Descriptive Statistics; Distribution, Normal; Law of Large Numbers; Variables, Random

BIBLIOGRAPHY

David, F. N. 1962. Games, Gods, and Gambling: The Origins and History of Probability and Statistical Ideas from the Earliest Times to the Newtonian Era. London: Griffin.

Grinstead, Charles M., and J. Laurie Snell. 1997. Introduction to Probability. 2nd rev. ed. Providence, RI: American Mathematical Society.

Dennis Kristensen

International Encyclopedia of the Social Sciences

central limit theorem

views updated May 08 2018

central limit theorem In statistics, the theorem stating of a series of data sets drawn from any probability distribution, that the distribution of the means of those data sets will follow a normal distribution.

A Dictionary of Earth Sciences AILSA ALLABY and MICHAEL ALLABY

central limit theorem

views updated May 14 2018

central limit theorem The theorem stating that the arithmetic mean values for a series of similar-sized, fairly large samples (n > 30) taken from a large population will be approximately normally distributed about the true population mean (n > 30), irrespective of the actual distribution pattern of the individual counts.

A Dictionary of Plant Sciences MICHAEL ALLABY

central limit theorem

views updated May 18 2018

central limit theorem The theorem stating that the arithmetic mean values for a series of similar-sized, fairly large samples (n > 30) taken from a large population will be approximately normally distributed about the true population mean (μ), irrespective of the actual distribution pattern of the individual counts.

A Dictionary of Ecology MICHAEL ALLABY

central limit theorem

views updated May 14 2018

central limit theorem A theorem stating that the arithmetic-mean values for a series of similar-sized, fairly large samples (n > 30) taken from a large population will be distributed approximately normally about the true population mean (μ), irrespective of the actual distribution pattern of the individual counts.

A Dictionary of Zoology MICHAEL ALLABY