Central Limit Theorem (Part II): \(m\)-Dependent Sequences and Stable Distributions

This section will break through the limitation of "independent and identically distributed (i.i.d.)" in the traditional Central Limit Theorem. First, we explore the Central Limit Theorem for \(m\)-dependent sequences with local dependence; then, we extend the one-dimensional results to multiple dimensions via the Cramér-Wold theorem; finally, we delve into the generalized form of limit distributions—Stable Distributions—and their Domain of Attraction (DA).

1. Central Limit Theorem for \(m\)-Dependent Random Variable Sequences

In practical applications (such as time series analysis), data often exhibit serial correlation. We first consider the simplest form of dependence structure: local dependence.

Definition 3.13: \(m\)-dependent sequence

A sequence of random variables \(\{X_n\}_{n \ge 1}\) is called \(m\)-dependent if there exists a positive integer \(m\) such that for any \(n \ge 1\) and \(j \ge m\), the random variable \(X_{n+j}\) is independent of the \(\sigma\)-algebra \(\mathcal{F}_n = \sigma\{X_i, 1 \le i \le n\}\) generated by the first \(n\) variables.

Example: The moving average model of order \(q\), MA(\(q\)), is a \(q+1\)-dependent sequence.

Theorem 3.14 (CLT for \(m\)-dependent sequences)

Let \(\{X_n\}_{n \ge 1}\) be an \(m\)-dependent sequence. Assume the random variables are uniformly bounded (i.e., there exists a constant \(M\) such that \(\sup_n |X_n| \le M\)).
Denote \(S_n = \sum_{i=1}^n X_i\), \(\sigma_n^2 = \operatorname{Var}(S_n)\). If the variance growth condition

\[ \frac{\sigma_n}{m n^{1/3}} \rightarrow \infty \quad \text{as } n \rightarrow \infty \]

holds, and \(m = o(n^{1/3})\), then

\[ \frac{S_n - E(S_n)}{\sigma_n} \xrightarrow{d} N(0,1) \]

(Note: By introducing the Lindeberg condition, the assumption of "uniform boundedness" can be removed. For details, see S. Janson (2021)).

Proof of Theorem 3.14: Blocking Technique (Click to expand)

Core Idea: Partition the entire sequence into alternating "large blocks" and "small blocks". Due to \(m\)-dependence, as long as the gap between large blocks (the length of the small blocks) is at least \(m\), the large blocks are mutually independent.

Without loss of generality, assume \(E(X_j) = 0\). Since the sequence is uniformly bounded, there exists \(M\) such that \(\sup_n |X_n| \le M\).

Step 1: Construct Large and Small Blocks

Let the large block length be \(k = [n^{1/3}]\), and the small block length be \(m\). Then the total number of blocks is \(p_n = [\frac{n}{k+m}] = O(n^{2/3})\). Denote \(B_j = j(k+m)\).

We construct:

Large blocks: \(Y_j = X_{B_{j-1}+1} + \cdots + X_{B_{j-1}+k}\) (total \(p_n\) blocks)
Small blocks: \(Z_j = X_{B_{j-1}+k+1} + \cdots + X_{B_j}\) (total \(p_n\) blocks)
Residual block: \(R_p = X_{B_{p_n}+1} + \cdots + X_n\)

Since \(k \gg m\) for sufficiently large \(n\), and the gap between large blocks \(Y_j\) is \(m\), the sequence \(\{Y_j\}_{j=1}^{p_n}\) is mutually independent. Similarly, \(\{Z_j\}_{j=1}^{p_n}\) is also mutually independent.

We decompose the total sum into three parts:

\[ S_n = \sum_{j=1}^{p_n} Y_j + \sum_{j=1}^{p_n} Z_j + R_p := S_n' + S_n'' + S_n''' \]

Step 2: Control the Variance of Small and Residual Blocks

Since \(\sup_j |X_j| \le M\), we have the covariance bound \(|E(X_j X_l)| \le M^2\). For the residual block \(S_n'''\):

\[ Var(S_n''') = E[(S_n''')^2] = \left| \sum_{j,l} E(X_{\dots} X_{\dots}) \right| \le (n - p_n(k+m))^2 M^2 \le (k+m)^2 M^2 \]

Therefore, it is bounded in probability:

\[ S_n''' = O_p(\sqrt{Var(S_n''')}) = O_p(k+m) = O_p(n^{1/3}) \]

Similarly, for the sum of small blocks \(S_n''\):

\[ E[Z_j^2] = E\left[ \left(\sum X_{\dots}\right)^2 \right] \le m^2 M^2 \]

Since the \(Z_j\) are independent, \(Var(S_n'') = \sum Var(Z_j) \le p_n m^2 M^2\). Thus:

\[ S_n'' = O_p(p_n^{1/2} m) = O_p(n^{1/3} m) \]

Step 3: Show Small and Residual Blocks are Negligible

Using the given condition \(\sigma_n / (m n^{1/3}) \rightarrow \infty\), we have:

\[ \frac{S_n''}{\sigma_n} = \frac{S_n''}{m n^{1/3}} \times \frac{m n^{1/3}}{\sigma_n} = O_p(1) \cdot o(1) = o_p(1) \]

Similarly, since \(k = O(n^{1/3})\), we have \(S_n''' / \sigma_n = o_p(1)\).

Therefore, the standardized total sum can be written as:

\[ \frac{S_n}{\sigma_n} = \frac{S_n'}{\sigma_n} + o_p(1) = \frac{\sigma_n'}{\sigma_n} \frac{S_n'}{\sigma_n'} + o_p(1) \]

where \(\sigma_n'^2 = Var(S_n')\). It now suffices to prove \(\sigma_n'^2 / \sigma_n^2 \rightarrow 1\) and \(S_n' / \sigma_n' \xrightarrow{d} N(0,1)\).

Step 4: Asymptotic Equivalence of Variances

Expanding the variance of \(S_n\):

\[ E(S_n^2) = E(S_n'^2) + E(S_n''^2) + E(S_n'''^2) + 2E(S_n' S_n'') + \dots \]

Due to \(m\)-dependence, most cross-covariances are 0:

\[ E(S_n' S_n'') = \sum_{j,l=1}^{p_n} Cov(Y_j, Z_l) = \sum_{j=1}^{p_n} [Cov(Y_j, Z_j) + Cov(Y_j, Z_{j-1})] \le 2p_n (mM)^2 \]

Combining the orders of all error terms, we obtain:

\[ \left| 1 - \frac{\sigma_n'^2}{\sigma_n^2} \right| = O\left( \frac{m^2 n^{2/3}}{\sigma_n^2} \right) \rightarrow 0 \]

Hence, \(\sigma_n'^2 / \sigma_n^2 \rightarrow 1\).

Step 5: Central Limit Theorem for Large Blocks

Since the large blocks \(Y_j\) are mutually independent, we can verify the Lindeberg condition for \(\{Y_j\}\). Because \(|Y_j| \le kM = O(n^{1/3}) = o(\sigma_n')\), for any \(\eta > 0\), when \(n\) is sufficiently large, the indicator function \(\mathbb{I}(|Y_j| \ge \eta \sigma_n')\) is identically 0:

\[ \frac{1}{\sigma_n'^2} \sum_{j=1}^{p_n} E\left[ Y_j^2 \mathbb{I}(|Y_j| \ge \eta \sigma_n') \right] \rightarrow 0 \]

Thus, the Lindeberg condition holds. By the Lindeberg-Feller CLT, we have \(S_n' / \sigma_n' \xrightarrow{d} N(0,1)\). Combining with Slutsky's theorem, the final conclusion is proven. \(\square\)

2. Multidimensional Central Limit Theorem and Cramér-Wold Theorem

To extend the one-dimensional central limit theorem to multidimensional random vectors, we rely on the Cramér-Wold theorem. Its core idea is: The weak convergence of a multidimensional random vector is equivalent to the weak convergence of its projection onto any one-dimensional direction.

Theorem 3.15: Cramér-Wold Theorem

Let \(X_n\) be a sequence of random vectors in \(\mathbb{R}^d\), and let \(X\) be a random vector. Then \(X_n\) converges in distribution to \(X\) if and only if for any linear combination direction \(a \in \mathbb{R}^d\), we have:

\[ X_n \xrightarrow{d} X \iff a^T X_n \xrightarrow{d} a^T X, \quad \forall a \in \mathbb{R}^d \]

Proof of the Cramér-Wold Theorem (click to expand)

"\(\implies\)": This follows directly from the Continuous Mapping Theorem, since the inner product function \(g(x) = a^T x\) is continuous.

"\(\impliedby\)": Use the characteristic function. Let \(X_n = (X_{n1}, \dots, X_{nd})^T\). Take any \(c = (c_1, \dots, c_d)^T \in \mathbb{R}^d\). The given condition implies:

\[ c^T X_n = c_1 X_{n1} + \dots + c_d X_{nd} \xrightarrow{d} c_1 X_1 + \dots + c_d X_d = c^T X \]

According to Lévy's continuity theorem, weak convergence of a one-dimensional random variable implies pointwise convergence of its characteristic function. For \(c^T X_n\), its characteristic function at parameter \(t\) is:

\[ \phi_{c^T X_n}(t) = E\left[ e^{it(c_1 X_{n1} + \dots + c_d X_{nd})} \right] \]

In particular, setting \(t=1\), we have:

\[ \lim_{n \rightarrow \infty} E\left[ e^{i(c_1 X_{n1} + \dots + c_d X_{nd})} \right] = E\left[ e^{i(c_1 X_1 + \dots + c_d X_d)} \right] \]

This is precisely the joint characteristic function \(\phi_n(c_1, \dots, c_d)\) of the multidimensional random vector \(X_n\) at \(c\). That is:

\[ \lim_{n \rightarrow \infty} \phi_n(c) = \phi_X(c), \quad \forall c \in \mathbb{R}^d \]

Since the joint characteristic function converges everywhere, by the multidimensional Lévy continuity theorem, we conclude \(X_n \xrightarrow{d} X\). \(\square\)

Using the Cramér-Wold theorem, the CLT for a multidimensional i.i.d. sequence becomes very straightforward:

Theorem 3.16 (Multivariate Central Limit Theorem)

Let \(X_1, X_2, \dots\) be i.i.d. \(d\)-dimensional random vectors with mean vector \(\mu\) and finite covariance matrix \(\Sigma\). Denote \(\overline{X}_n = \frac{1}{n} \sum_{i=1}^n X_i\). Then:

\[ \sqrt{n}(\overline{X}_n - \mu) \xrightarrow{d} N_d(0, \Sigma) \]

3. Stable Distributions

For independent and identically distributed sequences with finite variance, their standardized sums converge to the normal distribution. A natural question is: If we relax the condition of finite variance, to which non-degenerate distributions can the standardized sums still converge? (Think of the example of the Cauchy distribution.)

This leads to the concept of stable distributions.

Definition 3.17: Stable Distribution

A distribution \(F\) is called stable, if for independent random variables \(X_1, X_2\) following \(F\), and any non-negative constants \(c_1, c_2\), there exist constants \(a(c_1, c_2)\) and \(b(c_1, c_2) > 0\) such that:

\[ c_1 X_1 + c_2 X_2 \stackrel{d}{=} b(c_1, c_2) X + a(c_1, c_2) \]

where \(X \sim F\) and is independent of \(X_1, X_2\).

Theorem 3.18 (Limit Property of Stable Distributions): The family of non-degenerate stable distributions is exactly equivalent to the family of all possible non-degenerate limit distributions of sums of i.i.d. random variables after appropriate centering and scaling.

3.1 Spectral Representation of Stable Distributions

Since, except for a few special cases (normal distribution, Cauchy distribution, Lévy distribution), stable distributions do not have closed-form analytic probability density functions, we typically characterize them through their characteristic functions.

Theorem 3.19 (Characteristic Function of a Stable Distribution)

The characteristic function of a stable distribution has the following form:

\[ \phi_X(t) = E(e^{itX}) = \exp\left\{ i\gamma t - c|t|^\alpha (1 - i\beta \text{sgn}(t) z(t, \alpha)) \right\} \]

where the parameter ranges are: location parameter \(\gamma \in \mathbb{R}\), scale parameter \(c > 0\), characteristic exponent \(\alpha \in (0, 2]\), and skewness parameter \(\beta \in [-1, 1]\). Additionally:

\[ z(t, \alpha) = \begin{cases} \tan\left(\frac{\pi \alpha}{2}\right), & \text{if } \alpha \ne 1 \\ -\frac{2}{\pi} \ln|t|, & \text{if } \alpha = 1 \end{cases} \]

Definition 3.20 (\(\alpha\)-Stable Distribution)

The parameter \(\alpha\) in the formula is called the characteristic exponent. The corresponding distribution is referred to as an \(\alpha\)-stable distribution.

Remarks on the Parameters:

\(\alpha = 2\) corresponds to the normal distribution \(N(\gamma, 2c)\).
\(\alpha = 1, \beta = 0\) corresponds to the symmetric Cauchy distribution.
\(\beta\) describes the skewness of the distribution. When \(\beta = 0\), the characteristic function is real-valued, indicating the distribution is symmetric.
Stable distributions present difficulties in statistical inference because they are hard to simulate, and maximum likelihood estimation is not straightforward to write directly. Typically, parameter estimation relies on Empirical Characteristic Functions.

4. Domain of Attraction (DA)

Given an \(\alpha\)-stable distribution \(G_\alpha\), we need to know what kind of original distribution \(F\), after what kind of standardization \(b_n > 0\) and centering \(a_n\), will have its sample sum limit fall into \(G_\alpha\)?

Definition 3.21: Domain of Attraction

If for i.i.d. random variables with distribution \(F\), the sum \(S_n = \sum_{i=1}^n X_i\), there exist constants \(a_n \in \mathbb{R}\) and \(b_n > 0\) such that:

\[ b_n^{-1} (S_n - a_n) \xrightarrow{d} G_\alpha \]

then the distribution \(F\) is said to belong to the Domain of Attraction of \(G_\alpha\), denoted as \(X \in DA(G_\alpha)\) or \(X \in DA(\alpha)\).

4.1 Characterization of Domains of Attraction

To characterize the domain of attraction, the concept of a slowly varying function is needed: a function \(L\) is called slowly varying if for all \(t > 0\), \(\lim_{x \to \infty} L(tx)/L(x) = 1\).

Theorem 3.22 and Corollary 3.23 (Characterization of Domains of Attraction)

Domain of attraction of the normal distribution \(DA(2)\): \(F \in DA(2)\) if and only if \(L(x) = \int_{|y|<x} y^2 dF(y)\) is a slowly varying function. This is equivalent to the tail probability satisfying:

\[ P(|X| > x) = o\left( x^{-2} \int_{|y|<x} y^2 dF(y) \right) \quad \text{as } x \rightarrow \infty \]

In particular, all distributions with finite second moment (\(E(X^2) < \infty\)) belong to the domain of attraction of the normal distribution.

Domain of attraction of the \(\alpha < 2\) stable distribution \(DA(\alpha)\): \(F \in DA(\alpha)\) if and only if its left and right tails exhibit Pareto-type decay:

\[ F(-x) = \frac{c_1 + o(1)}{x^\alpha} L(x), \quad 1 - F(x) = \frac{c_2 + o(1)}{x^\alpha} L(x) \quad \text{as } x \rightarrow \infty \]

where \(c_1, c_2 \ge 0\) and \(c_1 + c_2 > 0\).

Corollary 3.24 (Properties of Moments): If \(X \in DA(\alpha)\), then:

For \(\delta < \alpha\), \(E(|X|^\delta) < \infty\).
For \(\delta > \alpha\) (and \(\alpha < 2\)), \(E(|X|^\delta) = \infty\). (This also implies that if \(\alpha < 2\), the variance is necessarily infinite; if \(\alpha \le 1\), the mean is also infinite.)

4.2 Selection of Normalization and Centering Constants

Proposition 3.25 (Normalization constant \(b_n\)): For \(F \in DA(\alpha)\), the normalization constant \(b_n\) can be chosen as the unique solution to the following equation:

\[ G(b_n) + K(b_n) = n^{-1}, \quad n \ge 1 \]

where \(G(x) = P(|X| > x)\), and \(K(x) = x^{-2} \int_{|y| \le x} y^2 dF(y)\). If \(E(X^2) < \infty\), then \(b_n \sim \sigma \sqrt{n}\). If \(\alpha < 2\), \(b_n\) typically takes the form \(n^{1/\alpha} L(n)\).

Proposition 3.26 (Centering constant \(a_n\)): The centering constant can be chosen as:

\[ a_n = n \int_{|y| \le b_n} y dF(y) \]

5. Generalized Central Limit Theorem and Domain of Normal Attraction (DNA)

Integrating the above properties, we obtain the most general form of the central limit theorem:

Theorem 3.27 (General Central Limit Theorem, General CLT)

Assume \(F \in DA(\alpha)\), where \(\alpha \in (0, 2]\).

If \(E(X^2) < \infty\), then:

\[ \frac{S_n - n\mu}{\sigma \sqrt{n}} \xrightarrow{d} N(0,1) \]

If \(E(X^2) = \infty\) and \(\alpha = 2\); or if \(\alpha < 2\), then:

\[ \frac{S_n - a_n}{n^{1/\alpha} L_1(n)} \xrightarrow{d} G_\alpha \]

where \(G_\alpha\) is some \(\alpha\)-stable distribution, and \(L_1\) is an appropriate slowly varying function.

Definition 3.28 and Corollary 3.29: Domain of Normal Attraction (DNA)

Note that the denominator in the general CLT contains the slowly varying function \(L_1(n)\). If, in the limit, the normalization constant can be taken directly as a pure power form \(b_n = c n^{1/\alpha}\) (i.e., \(L_1(n)\) degenerates to a constant), we say the distribution belongs to the Domain of Normal Attraction (DNA), denoted as \(F \in DNA(G_\alpha)\).

\(F \in DNA(2)\) if and only if \(E(X^2) < \infty\).
When \(\alpha < 2\), \(F \in DNA(\alpha)\) if and only if the tail strictly follows a power-law decay (i.e., without additional slowly varying function interference):

\[ F(-x) \sim c_1 x^{-\alpha}, \quad 1 - F(x) \sim c_2 x^{-\alpha} \]

In particular, every \(\alpha\)-stable distribution belongs to its own domain of normal attraction.