מילון מונחים

בחר אחת ממילות המפתח משמאל...

ProbabilityCentral Limit Theorem

זמן קריאה: ~35 min

Convergence in distribution

The central limit theorem, one of the most important results in applied probability, is a statement about the convergence of a sequence of probability measures. So, we begin this section by exploring what it should mean for a sequence of probability measures to converge to a given probability measure.

Roughly speaking, we will consider two probability measures close if they put approximately the same amount of probability mass in approximately the same places on the number line. For example, a sequence of continuous probability measures with densities f_1, f_2, \ldots converges to a continuous probability measure with density f if \lim_{n\to\infty} f_n(x) = f(x) for all x \in \mathbb{R}:

The sequence of densities XEQUATIONX1772XEQUATIONX converges to the density f as n\to\infty.

If the limiting probability measure is not continuous, then the situation is slightly more complicated. For example, we would like to say that the probability measure which puts a mass of \frac{1}{2}+\frac{1}{n} at \frac{1}{n} and a mass of \frac{1}{2}-\frac{1}{n} at 1 + \frac{1}{n} converges to the fair coin flip distribution as n\to\infty. This does not correspond to pointwise convergence of the probability mass functions, since we don't have convergence of probability mass function values at 0 or at 1 in this example.

The probability measures which assign mass \frac{1}{2}+\frac{1}{n} and XEQUATIONX1773XEQUATIONX to \frac{1}{n} and XEQUATIONX1774XEQUATIONX, respectively, (shown in sea green) converge to the Bernoulli distribution with success probability \frac{1}{2}(shown in red).

We can get around this problem by giving ourselves a little space to the left and right of any point where the limiting measure has a positive probability mass. In other words, suppose that \nu is a probability measure on \mathbb{R} with probability mass function m, and consider an interval I = (a,b). Let's call such an interval a continuity interval of \nu if m(a) and m(b) are both zero.

We will say that a sequence of probability measures \nu_1, \nu_2, \ldots converges to \nu if \nu_n(I) converges to \nu(I) for every continuity interval I of \nu.

We can combine the discrete and continuous definitions into a single definition:

Definition (Convergence of probability measures on \mathbb{R})
A sequence \nu_1, \nu_2, \ldots of probability measures on \mathbb{R} converges to a probability measure \nu on \mathbb{R} if \nu_n(I) \to \nu(I) whenever I is an interval satisfying \nu(\{a,b\}) = 0, where a and b are the endpoints of I.

Exercise
Define f_n(x) to be n when 0 \leq x \leq 1/n and 0 otherwise, and let \nu_n be the probability measure with density f_n. Show that \nu_n converges to the probability measure \nu which puts of all its mass at the origin.

Solution. Suppose I=(a,b) is a continuity interval of \nu.

If I contains the origin, then the terms of sequence \nu_1(I), \nu_2(I), \ldots are equal to 1 for large enough n, since all of the probability mass of \nu_n is in the interval \left[0,\frac{1}{n}\right] and eventually \left[0,\frac{1}{n}\right] \subset I.

If I does not contain the origin, then the terms of the sequence \nu_1(I), \nu_2(I), \ldots are eventually equal to 0, for the same reason.

In either case, \nu_n(I) converges to \nu(I). Therefore, \nu_n converges to \nu.

The central limit theorem

The law of large numbers tells us that the distribution \nu of a mean of many independent, identically distributed finite-variance, mean-\mu random variables is concentrated around \mu. This a mathematical formalization of the well-known fact that flipping a coin many times results in a heads proportion close to 1/2 with high probability, or the average of many die rolls is very close to 3.5 with high probability.

The central limit theorem gives us precise information about how the probability mass of \nu is concentrated around its mean. Consider a sequence of independent fair coin flips X_1, X_2, \ldots, and define the sums

\begin{align*}S_n = X_1 + \cdots + X_n,\end{align*}

for n \geq 1. The probability mass functions of the S_n's can be calculated exactly, and they are graphed in the figure below, for several values of n. We see that the graph is becoming increasingly bell-shaped as n increases.

Probability mass functions of sums of Bernoulli(1/2) random variables.

If we repeat this exercise with other distributions in place of the independent coin flips, we obtain similar results. For example, the Poisson distribution is a discrete distribution which assigns mass e^{-3}3^{k}/k! to each nonnegative integer k. The probability mass functions for sums of the independent Poisson(3) random variables is shown in the figure below. Not only is the shape of the graph stabilizing as n increases, but we're apparently getting the same shape as in the Bernoulli example.

Probability mass functions of sums of Poisson(3) random variables.

To account for the shifting and spreading of the distribution of S_n, we normalize it: we subtract its mean and then divide by its standard deviation to obtain a random variable with mean zero and variance 1:

\begin{align*}S_n \quad\stackrel{\text{shift}}{\longrightarrow}\quad S_n - n\mu \quad \stackrel{\text{scale}}{\longrightarrow}\quad \frac{S_n - n\mu}{\sigma\sqrt{n}}\end{align*}

So, we define S_n^* = \frac{S_n - n\mu}{\sigma\sqrt{n}}, which has mean 0 and variance 1. Based on the figures above, we conjecture that the distribution of S_n^* converges as n\to\infty to some distribution with a bell-shaped probability density function.

This conjecture turns out to be correct, with a Gaussian as the limiting distribution. The standard Gaussian distribution is denoted \mathcal{N}(0,1) and has probability density function t\mapsto \frac{1}{\sqrt{2\pi}}e^{-t^2/2}.

Theorem (Central Limit theorem)
Suppose that X_1,X_2,\ldots, are independent, identically distributed random variables with mean \mu and finite standard deviation \sigma, and defined the normalized sums S_n^* = (X_1 + \cdots + X_n - n\mu)/(\sigma\sqrt{n}) for n \geq 1.

For all -\infty \leq a < b \leq \infty, we have

\begin{align*}\lim_{n\to\infty} \mathbb{P}( a < S_n^* < b) = \mathbb{P}(a < Z < b),\end{align*}

where Z \sim \mathcal{N}(0,1). In other words, the sequence S_1^*, S_2^*,\ldots converges in distribution to Z.

The normal approximation is the technique of approximating the distribution of S_n^* as \mathcal{N}(0,1).

Example
Suppose we flip a coin which has probability 60% of turning up heads n times. Use the normal approximation to estimate the value of n such that the proportion of heads is between 59% and 61% with probability approximately 99%.

Solution. We calculate the standard deviation \sigma = \sqrt{(0.4)(0.6)} and the mean \mu = 0.6 of each flip, and we use these values to rewrite the desired probability in terms of S_n^*. We find

\begin{align*}P\left( 0.59 < \frac{1}{n}S_n < 0.61\right) &= P\left( -0.01 < \frac{S_n - \mu n}{n} < 0.01\right) \\ &= P\left( -\frac{0.01\sqrt{n}}{\sqrt{0.4\cdot0.6}} < \frac{S_n - \mu n}{\sigma\sqrt{n}} <\frac{0.01\sqrt{n}}{\sqrt{0.4\cdot0.6}}\right),\end{align*}

where the last step was obtained by multiplying all three expressions in the compound inequality by \sqrt{n}/\sigma. Since S_n^* is distributed approximately like a standard normal random variable, the normal approximation tells us to look for the least n so that

\begin{align*}\int_{-a_n}^{a_n} \, dt > 0.99,\end{align*}

where a_n = 0.01\sqrt{n}/\sqrt{0.4\cdot0.6}. By the symmetry of the Gaussian density, we may rewrite this equation as

\begin{align*}\int_{-\infty}^{a_n} \, dt > 0.995.\end{align*}

Defining the normal CDF \Phi(x) = \int_{-\infty}^x \frac{1}{\sqrt{2\pi}}e^{-t^2/2}, dt, we want to find the least integer n such that a_n exceeds \Phi^{-1}(0.995). The following code tells us that \Phi^{-1}(0.995) \approx 2.576.

from scipy.stats import norm
norm.ppf(0.995)
using Distributions
quantile(Normal(0,1), 0.995)

Setting this equal to a_n and solving for n gives 15,924. The exact value of n for which the probability is closest to 99% is 15,861, so we can see that the normal approximation worked pretty well in this case.

Example
Consider a random variable S_n which is defined to be the sum of n independent fair coin flips. The law of such a random variable is called a binomial distribution. Let m_n:\mathbb{R} \to [0,1] be the pmf of S_n^* = (S_n - n\mu)/(\sigma\sqrt{n})= (S_n - n/2)/\sqrt{n/2}. Use the code block below to observe that m_n(x) appears to converge to 0 for all x \in \mathbb{R}, and explain why this does not contradict the central limit theorem.

For simplicity, you may assume that n is even.

import matplotlib.pyplot as plt
import scipy.stats
def binom_stickplot(n):
    """
    Return a stick plot representing the pmf
    of a sum of n independent coin flips
    """
    ν = scipy.stats.binom(n,0.5)
    # x contains the possible RV values:
    x = (np.arange(n+1) - n/2)/np.sqrt(n/2)
    # y contains the probabilities:
    y = [ν.pmf(k) for k in range(n+1)]
    plt.ylim(0,1)
    return plt.vlines(x,y)

binom_stickplot(10)
using Plots, Distributions
function binom_stickplot(n)
    ν = Binomial(n, 0.5)
    sticks((-n÷2: n÷2)/sqrt(n/2), [pdf(ν, k) for k in 0:n],
           label = "Binomial($n,1/2)", ylims = (0, 1))
end
binom_stickplot(1000)

Solution. Executing the cells, we see that the height of the tallest stick indeed goes to zero as the argument to binom_stickplot is increased.

This finding does not contract the central limit theorem, since convergence in distribution is not based on convergence of the amount of probability mass at individual points but rather on the amount of probability mass assigned to intervals. In any positive-width interval, the distribution of S_n^* has many points with nonzero probability mass. Since there are many of them, they can be small individually while nevertheless totaling up to a non-small mass.

Exercise
Suppose that the percentage of residents in favor of a particular policy is 64%. We sample n individuals uniformly at random from the population.

  • In terms of n, find a interval I centered at 0.64 such that the proportion of residents polled who are in favor of the policy is in I with probability about 95%.

  • How many residents must be polled for the proportion of poll participants who are in favor of the policy to be between 62% and 66% with probability at least 95%?

Solution. Let X_i be the i th sample from the population (1 if the resident is in favor, and 0 otherwise). Then the proportion of the residents in favor of the policy is \overline{X} = \frac{X_1+X_2 + \cdots +X_n}{n}. Each X_i is a Bernoulli (0.64) random variable with \mathbb{E}[X_i] = 0.64 and \sigma(X_i)= \sqrt{0.64(1 - 0.64)} = 0.48.

We need to find \operatorname{Var}\epsilon > 0 such that \mathbb{P}\left(0.64 - \operatorname{Var}\epsilon \leq \overline{X} \leq 0.64 + \operatorname{Var}\epsilon\right) = 0.95. Equivalently, we need to find \operatorname{Var}\epsilon > 0 such that \mathbb{P}\left(\frac{-n\operatorname{Var}\epsilon}{\sigma(X_1)\sqrt{n}} \leq \frac{X_1+X_2+ \cdots + X_n - 0.64n}{\sigma(X_1)\sqrt{n}} \leq \frac{n\operatorname{Var}\epsilon}{\sigma(X_1)\sqrt{n}}\right) = 0.95. Now by the Central Limit Theorem, \frac{X_1+X_2+ \cdots + X_n - 0.64n}{\sigma(X_1)\sqrt{n}} \thicksim \mathcal{N}(0, 1) for n large. Since \mathbb{P}(-1.96 \leq Z \leq 1.96) \approx 0.95 for Z \thicksim \mathcal{N}(0, 1), we look to solve

\begin{align*}\frac{n\operatorname{Var}\epsilon}{\sigma(X_1)\sqrt{n}} \approx 1.96.\end{align*}

Therefore,

\begin{align*}\operatorname{Var}\epsilon = 1.96 \frac{\sigma(X_1)}{\sqrt{n}} = \frac{0.9408}{\sqrt{n}}\end{align*}

and with probability 95%, the proportion of polled residents in favor of the policy will be in I = [0.64 - \operatorname{Var}\epsilon, 0.64 + \operatorname{Var}\epsilon].

For the second part, we want to find n such that \mathbb{P}(0.64 - 0.02 \leq \overline{X} \leq 0.64 + 0.02) \geq 0.95. From above, we find that 0.02 \geq \frac{0.9408}{\sqrt{n}} and thus n \geq \left(\frac{0.9408}{0.02}\right)^2 \approx 2212.8. Therefore at least 2,213 residents must be polled, according to the normal approximation.

Exercise
Suppose that X_1, X_2, \ldots is a sequence of independent, identically distributed random variables with variance 2 and mean 7. Find the limits of each of the following probabilities n\to\infty.

  • \mathbb{P}(X_1 + \cdots + X_{n} = 7n)
  • \mathbb{P}(6.9n < X_1 + \cdots + X_{n} < 7.1n)
  • \mathbb{P}(7n < X_1 + \cdots + X_{n} < 7n + 3\sqrt{n})
  • \mathbb{P}(6n < X_1 + \cdots + X_{n} < 7n + 3\sqrt{n})

Solution. Let Z \thicksim \mathcal{N}(0, 1).

For each non-negative integer n, we have \mathbb{P}(X_1+\cdots+X_n = 7n) = \mathbb{P}\left(\frac{X_1+\cdots +X_n - 7n}{\sqrt{2n}} = 0\right). By the Central Limit Theorem (CLT),

\begin{align*}\lim\limits_{n \to \infty}\mathbb{P}\left(\frac{X_1+\cdots +X_n - 7n}{\sqrt{2n}} = 0\right) = \mathbb{P}(Z= 0) = 0.\end{align*}

We have

\begin{align*}\lim\limits_{n \to \infty}&\mathbb{P}(6.9n< X_1+\cdots +X_n<7.1n) \\\ &= \lim\limits_{n \to \infty}\mathbb{P}\left(\frac{-0.1n}{\sqrt{2n}} < \frac{X_1+\cdots +X_n - 7n}{\sqrt{2n}} < \frac{0.1n}{\sqrt{2n}} \right) \\\ &= \lim\limits_{n \to \infty}\mathbb{P}\left(\frac{-0.1\sqrt{n}}{\sqrt{2}}< \frac{X_1+\cdots +X_n - 7n}{\sqrt{2n}} < \frac{0.1\sqrt{n}}{\sqrt{2}} \right) \\\ &= \mathbb{P}(-\infty< Z<\infty) = 1\end{align*}

by the CLT. Since \mathbb{P}(7n< X_1+\cdots +X_n<7n+3\sqrt{n}) = \mathbb{P}\left(0 < \frac{X_1+\cdots +X_n - 7n}{\sqrt{2n}} < \frac{3}{\sqrt{2}} \right) for all n \geq 1, we find that

\begin{align*}\lim\limits_{n \to \infty}\mathbb{P}(7n< X_1+\cdots +X_n<7n+3\sqrt{n}) = \mathbb{P}\left(0 < Z < \frac{3}{\sqrt{2}}\right) \approx 0.483\end{align*}

by the CLT. We have

\begin{align*}\lim\limits_{n \to \infty}&\mathbb{P}(6n< X_1+\cdots +X_n<7.1n) \\\ &= \lim\limits_{n \to \infty}\mathbb{P}\left(\frac{-n}{\sqrt{2n}} < \frac{X_1+\cdots +X_n - 7n}{\sqrt{2n}} < \frac{3\sqrt{n}}{\sqrt{2n}} \right) \\\ &= \lim\limits_{n \to \infty}\mathbb{P}\left(\frac{-\sqrt{n}}{\sqrt{2}}< \frac{X_1+\cdots +X_n - 7n}{\sqrt{2n}} < \frac{3}{\sqrt{2}} \right) \\\ &= \mathbb{P}\left(-\infty< Z<\frac{3}{\sqrt{2}}\right) \approx 0.983\end{align*}

by the CLT.

Bruno
Bruno Bruno