Chapter 3: Central Limit Theorem (Part I)
Unlike the classical central limit theorem (which requires independent and identically distributed, i.i.d., variables), this chapter explores more general central limit theorems, specifically the case where random variables are independent but not identically distributed (i.n.i.d.).
1. Double Arrays & A Lemma on Complex Limits
When dealing with sums of random variables from different distributions, we often represent them in the form of a "double array" (or triangular array).
Definition 3.1: Double Array of Independent Random Vectors
For each \(n \ge 1\), let \(\{X_{n1}, X_{n2}, \dots, X_{nk_n}\}\) be a set of random vectors defined on a probability space \((\Omega_n, \mathcal{F}_n, P_n)\), such that for a given \(n\), \(X_{n1}, \dots, X_{nk_n}\) are mutually independent, and \(k_n \to \infty\) as \(n \to \infty\). Then \(\{X_{nj} : 1 \le j \le k_n\}_{n \ge 1}\) is called a double array of independent random vectors.
Common notations in this chapter:
- Expectation: \(\alpha_{nj} = E(X_{nj})\), row total expectation \(\alpha_n = \sum_{j=1}^{k_n} E(X_{nj}) = \sum_{j=1}^{k_n} \alpha_{nj}\)
- Partial sum: \(S_n = \sum_{j=1}^{k_n} X_{nj}\)
- Variance: \(\sigma_{nj}^2 = Var(X_{nj})\), row total variance \(\sigma_n^2 = \sum_{j=1}^{k_n} \sigma_{nj}^2\)
To handle products of characteristic functions, we need to introduce a lemma on the limit of products of complex sequences.
Lemma 3.2: Limit of Products of Complex Sequences
Let \(\{\theta_{nj} : 1 \le j \le k_n\}_{n \ge 1}\) be a double array of complex numbers satisfying, as \(n \to \infty\):
(i) Uniform convergence to 0: \(\max_{1 \le j \le k_n} |\theta_{nj}| \to 0\) (ii) Uniformly bounded absolute sum: \(\sum_{j=1}^{k_n} |\theta_{nj}| \le M < \infty\) (where \(M\) is independent of \(n\)) (iii) Sum convergence: \(\sum_{j=1}^{k_n} \theta_{nj} \to \theta\) (where \(\theta\) is a finite complex number)
Then their product converges to the exponential function:
(Note: This generalizes the classical calculus result \(\lim_{n \to \infty} (1 + \theta/n)^n = e^\theta\), which corresponds to substituting \(\theta_{nj} \equiv \theta/n\) into this lemma.)
Detailed Proof of Lemma 3.2 (Click to expand)
For a non-zero complex number \(z\), the principal value of the complex logarithm is defined as \(Log~z = \log|z| + i Arg~z\), where \(Arg~z \in [-\pi, \pi]\). When \(|z| < 1\), the complex logarithm has the following Taylor series expansion:
By condition (i), there exists \(n_0\) such that for all \(n > n_0\), \(\max_{1 \le j \le k_n} |\theta_{nj}| \le 1/2\). Then \(|\theta_{nj}| < 1\) and \(1+\theta_{nj} \neq 0\). Consider the truncation error between its logarithmic expansion and the linear term:
Extracting the quadratic term and bounding the remainder using a geometric series:
Since the absolute error is bounded by \(|\theta_{nj}|^2\), we can write:
Summing over a row:
Using conditions (i) and (ii), estimate the total error term:
Combined with condition (iii) \(\sum_{j=1}^{k_n} \theta_{nj} \to \theta\), we obtain:
Taking the complex exponential on both sides completes the proof. \(\square\)
2. Liapounov's Central Limit Theorem (Liapounov's CLT)
If the sequence of random variables possesses moments higher than the second order, we can provide a very easily verifiable sufficient condition.
Theorem 3.3: Liapounov's CLT (Liapounov's Theorem)
For the double array \(\{X_{nj} : 1 \le j \le k_n\}_{n \ge 1}\), define the sum of their third-order central absolute moments as \(\Gamma_n = \sum_{j=1}^{k_n} E|X_{nj} - \alpha_{nj}|^3\), assuming this value is finite for each \(n\). If the Liapounov's Condition is satisfied:
then the standardized sum converges in distribution to the standard normal:
(Note: The third-order moment can be relaxed to a \(2+\delta\) order moment, where \(\delta > 0\).)
Rigorous Proof of Theorem 3.3 (click to expand)
Let \(\gamma_{nj} = E|X_{nj} - \alpha_{nj}|^3\). By Liapounov's moment inequality:
So \(\sigma_{nj}^3 \le \gamma_{nj}\). Consequently:
Let \(\phi_{nj}(t)\) be the characteristic function of the standardized variable \((X_{nj} - \alpha_{nj})/\sigma_n\). Since \(\gamma_{nj}\) is finite, the characteristic function can be expanded to the third order via Taylor series:
To apply Lemma 3.2, we set \(\theta_{nj} = \phi_{nj}(t) - 1\) and verify its three conditions:
Verification of (i):
Since \(\sigma_{nj}^2 = (\sigma_{nj}^3)^{2/3} \le (\max_j \sigma_{nj}^3)^{2/3}\), we have:
Also \(\max_j \gamma_{nj} / \sigma_n^3 \le \Gamma_n / \sigma_n^3 \rightarrow 0\). Therefore \(\max_j |\theta_{nj}| \to 0\), condition (i) holds.
Verification of (ii):
This is a bounded quantity, condition (ii) holds.
Verification of (iii):
Since the sum of error terms satisfies:
Therefore, the limit of the sum of characteristic function offsets is:
Condition (iii) holds.
In summary, according to Lemma 3.2, the characteristic function of the sum of independent standardized variables satisfies:
By the LΓ©vy-CramΓ©r continuity theorem, the result is proven. \(\square\)
For a general sequence with a single subscript, this corollary also applies:
Corollary 3.4 (Single Sequence)
Let \(\{X_n\}_{n \ge 1}\) be a sequence of independent random vectors. Let \(\alpha_j = E(X_j)\), \(\sigma_j^2 = Var(X_j)\) and \(\gamma_j = E|X_j - \alpha_j|^3 < \infty\). Let \(P_n = \sum_{j=1}^n \gamma_j\). If \(P_n / \sigma_n^3 \rightarrow 0\), then:
3. Lindeberg's Telescoping Method
The CLT can also be proved directly using analytical techniques (a precursor to the Stein method) without using characteristic functions.
Assume \(\alpha_j = 0\). Introduce a sequence of auxiliary random variables \(Y_1, \dots, Y_n\) following a standard normal distribution, such that they are independent of the \(X_j\), and \(Y_j \sim N(0, \sigma_j^2)\) matches the first two moments. Let \(Y_0 = \sum_{i=1}^n Y_i / \sigma_n \sim N(0,1)\).
Our goal is to prove that for all bounded test functions \(f \in C_B^\infty\) with bounded derivatives of all orders:
We rely on the following theorem:
Theorem 3.5 (Chung 6.1.6)
Let \(\{\mu_n\}\) be a sequence of probability measures. If for all infinitely differentiable test functions \(f \in C_B^\infty\) with bounded derivatives of all orders, we have:
where \(\mu\) is a probability measure, then \(\mu_n\) converges weakly to \(\mu\).
According to Theorem 3.5 (Chung 6.1.6), if the above expectation converges for all test functions, then the probability measures converge weakly.
Core of the Proof Based on Telescoping Expansion
Construct the mixed partial sum sequence \(Z_j\): \(Z_j = Y_1 + \dots + Y_{j-1} + X_{j+1} + \dots + X_n\) (for \(2 \le j \le n-1\)). The boundaries are \(Z_1 = X_2 + \dots + X_n\) and \(Z_n = Y_1 + \dots + Y_{n-1}\).
We write the total difference as a telescoping sum of term-by-term replacements:
Perform a third-order Taylor expansion of \(f\) at \(Z_i/\sigma_n\):
Since \(X_i\) and \(Y_i\) have identical first two moments, the first and second-order terms cancel perfectly upon taking expectations. Only the third-order error term remains, bounded by the upper bound \(M\) of the third derivative:
This method provides the foundation for the Stein method and Gaussian approximation for high-dimensional random vectors.
Corollary 3.6 Preliminary Theorem for Truncation Method
If there exists a double array \(|X_{nj}/\sigma_n| \le M_{nj}\) a.s. and \(\lim_{n \to \infty} \max_j M_{nj} = 0\), then the standardized sum converges to a normal distribution.
4. Null Arrays
To explore the sufficient and necessary conditions for the CLT to hold, we need to exclude pathological cases where a single variable dominates the overall variance.
Definition 3.7: Null Array
A double array is called a null array if for any \(\epsilon > 0\),
This is equivalent to saying each component \((X_{nj} - \alpha_j)/\sigma_n\) converges in probability to 0 uniformly in \(j\) as \(n \to \infty\).
Using characteristic functions, we can give a very convenient equivalent characterization:
Proposition 3.8: Equivalent Form via Characteristic Functions
A double array \(\{X_{nj}\}\) is a null array if and only if for all \(t \in \mathbb{R}\):
and this convergence is uniform on any finite interval \([-K, K]\).
Proof of Equivalence (click to expand)
(\(\Rightarrow\) direction): Without loss of generality, assume \(\alpha_j = 0\). Decompose the expectation into parts inside and outside the threshold \(\epsilon\sigma_n\):
Using the inequality \(|e^{itu} - 1| \le |tu|\) and the bound 2 for the modulus of the complex exponential:
Since we are on the bounded closed set \([-K, K]\):
As \(n \to \infty\), the first term tends to 0 by the definition of a null array, and the second term can be made arbitrarily small because \(\epsilon\) is arbitrary, thus proving uniform convergence.
(\(\Leftarrow\) direction): Using the classical characteristic function inequality:
Taking the max on both sides:
By the Bounded Convergence Theorem (BCT) for integrals and the given condition, the limit of the above expression tends to 0. The proposition is proved \(\square\).
5. Lindeberg-Feller Central Limit Theorem (Lindeberg-Feller CLT)
When we only know that the second moment exists, while the third moment may not exist, the Lyapunov condition fails. At this point, the Lindeberg Condition (LC) becomes the most precise sufficient condition for the CLT of independent variables to hold.
Definition 3.9: Lindeberg Condition (LC)
For a double array \(\{X_{nj}\}\), if for any \(\epsilon > 0\), its truncated variance ratio satisfies:
then the array is said to satisfy the Lindeberg condition.
This, combined with the following lemma, leads to the famous theorem in the asymptotic distribution theory of independent variables:
Lemma 3.10 (Diagonal Construction Method)
Let \(u(m, n)\) be a function defined on positive integers \(m\) and \(n\), such that for each fixed \(m\), we have:
Then there exists a monotonically increasing sequence \(\{m_n\}\) tending to infinity (\(m_n \to \infty\)), such that:
Proof of Lemma 3.10 (click to expand)
Since for each fixed \(m\), \(\lim_{n \to \infty} u(m, n) = 0\), by the definition of limit, there must exist an index \(n_m\) such that for all \(n \ge n_m\), we have:
In this way, we can construct a strictly monotonic increasing sequence \(\{n_m\}_{m \ge 1}\) that tends to infinity.
For any \(n\) satisfying \(n_m \le n < n_{m+1}\), we set \(m_n \equiv m\).
Thus, when \(n \ge n_m\), by construction we have:
Since as \(n \to \infty\), the index \(m\) corresponding to \(n_m\) tends to infinity, \(m_n\) also monotonically increases to infinity. By the squeeze theorem from the above inequality, we have:
Proof complete. \(\square\)
Theorem 3.11: Lindeberg-Feller CLT
Assume \(Var(X_{nj}) = \sigma_{nj}^2 < \infty\), \(S_n = \sum_{j=1}^{k_n} X_{nj}\). Then the following two sets of propositions are equivalent:
- (i) \(\frac{S_n - E S_n}{\sigma_n} \rightarrow N(0,1)\) and (ii) the double array is a null array.
- \(\Longleftrightarrow\) The double array satisfies the Lindeberg Condition (LC).
Core Proof Derivation of the Theorem (click to expand)
(1) Sufficiency Proof: LC \(\Rightarrow\) CLT and Null Array
Assume \(E(X_{nj})=0\) and \(\sigma_n^2 = 1\). Define truncated random variables based on a truncation point \(\eta \in (0, 1)\): \(X_{nj}' = X_{nj}\) (if \(|X_{nj}| < \eta\)), otherwise 0.
Compute the expectation and variance after truncation:
Summing over \(j\), due to the LC condition, the total expectation tends to 0. Meanwhile, the truncated variance \(\sigma_n'^2 \to 1 = \sigma_n^2\). According to Lemma 3.10 (diagonal construction principle), we can select a monotonically increasing sequence \(m_n \to \infty\) and set \(\eta_n = m_n^{-1} \to 0\). Using \(\eta_n\) as the truncation threshold ensures \(|X_{nj}'| \le \eta_n := M_{nj}\). Since \(\max M_{nj} = \eta_n \to 0\), by the earlier Corollary 3.6 (CLT for bounded variables), we have \((S_n' - ES_n')/\sigma_n' \to N(0,1)\), hence \(S_n' \to N(0,1)\).
Finally, evaluate the difference between the untruncated sum and the truncated sum:
By Slutsky's theorem, since the error is \(o_p(1)\), we finally obtain \(S_n \to N(0,1)\).
(2) Necessity Proof: CLT + Null Array \(\Rightarrow\) LC
From \(S_n \xrightarrow{d} N(0,1)\), we know the logarithm of the product of characteristic functions:
The null array guarantees \(\max_j |\phi_{nj}(t) - 1| \to 0\). Using the equivalence between \(\log \phi_{nj}(t)\) and \(\phi_{nj}(t) - 1\):
Extracting its real part:
Split the integral into two parts: \(|x| \le \eta\) and \(|x| > \eta\), and use the inequality \(0 \le 1 - \cos tx \le (tx)^2/2\):
By bounding the outer part of the integral controlled by Chebyshev's inequality (\(\le 2/\eta^2 + \epsilon\)), taking the limit as \(t \to \infty\), we finally force \(\sum_{j=1}^{k_n} E[X_{nj}^2 \mathbb{I}(|X_{nj}| > \eta)] \to 0\), which is precisely the Lindeberg condition LC. \(\square\)
6. Applications & Further Conditions
Application Example: Ordinary Least Squares Regression (OLS Regression)
Consider the classical linear regression model \(y_j = x_j \beta + \epsilon_j\), where the error terms \(\epsilon_j \sim (0, \sigma_\epsilon^2)\) and are i.i.d. The design matrix satisfies \(\max_{1 \le j \le n} \frac{|x_j|}{a_n} \to 0\), where \(a_n^2 = \sum_{j=1}^n x_j^2\). The OLS estimator is \(\hat{\beta}_{LS} = \sum x_j y_j / a_n^2\).
Construct the standardized double array \(X_{nj} = \frac{x_j \epsilon_j}{\sqrt{\sum x_j^2}}\). We bound the truncated integral in the Lindeberg condition: let \(m_n = \max_j |x_j/a_n|\),
Since the \(\epsilon_j\) are identically distributed and have finite second moments, this expectation tends to 0 as \(m_n \to 0\). By Theorem 3.11, we immediately obtain:
When verifying the LC condition, besides using bounded truncation, we can also utilize the existence of higher-order moments, which is an extension of the Lyapunov-type condition.
Proposition 3.12: Sufficient Criterion for the Lindeberg Condition
For a double array \(\{X_{nj}\}\), if there exists some real number \(\nu > 2\) such that:
then the array necessarily satisfies the Lindeberg condition.
Proof Derivation (click to expand)
In the truncation region \(|t - \mu_{nj}| > \epsilon \sigma_n\), we bound the second-moment integral:
By forcibly introducing the \(\nu\)-th power and extracting a constant factor:
Summing over \(j\) and dividing by \(\sigma_n^2\):
The Lindeberg condition is thus proved. \(\square\)