Chapter 6: Weakly Dependent Data II

4. Weakly Dependent Stationary Processes and the Central Limit Theorem

Through the mixing coefficients and covariance inequalities introduced previously, we can now formally define the phenomena of "weak dependence" and "long memory" in time series, and establish the Central Limit Theorem (CLT) for mixing processes.

4.1 Definition of Weak Dependence and Long Memory

Suppose \(\{X_i\}\) is a weakly stationary process with finite second moments. Let its autocovariance function be denoted as \(\gamma(j) = Cov(X_i, X_{i+j})\).

Definition 4.8: Weakly Dependent & Long Memory

If the autocovariance is absolutely summable:

\[ \sum_{k=0}^{\infty} |\gamma(k)| < \infty \]

Then the process is said to be weakly dependent (or a short memory process).

If the absolute sum of the autocovariance diverges:

\[ \sum_{k=0}^{\infty} |\gamma(k)| = \infty \]

Then the process is said to be a long memory process.

Relationship between Mixing Coefficients and Weak Dependence:

Let \(\alpha(k)\) be the strong mixing (\(\alpha\)-mixing) coefficient on the \(\sigma\)-algebra generated by \(\{X_i\}_{i \in \mathbb{Z}}\). According to Davydov's Inequality (Lemma 4.7, taking \(r=q\)), if there exists \(q > 2\) such that \(E|X_i|^q < \infty\), and for \(p = \frac{q}{q-2}\) the series convergence condition \(\sum_{k=0}^{\infty} \alpha^{1/p}(k) < \infty\) is satisfied, then:

\[ \sum_{k=0}^{\infty} |\gamma(k)| \le 2p \|X\|_q^2 \sum_{k=0}^{\infty} \alpha^{1/p}(k) < \infty \]

This indicates that a process satisfying this decay rate of mixing is necessarily weakly dependent (short memory). Particularly, if the process is Geometric Strong Mixing (GSM), i.e., \(\alpha(k) \le C\rho^k\) (where \(\rho \in (0,1)\)), then:

\[ \sum_{k=0}^{\infty} \alpha^{1/p}(k) \le C \sum_{k=0}^{\infty} \rho^{k/p} = \frac{C}{1 - \rho^{1/p}} < \infty \]

This automatically guarantees weak dependence. In general, to ensure the convergence of the series, we only need the mixing coefficients to satisfy a polynomial decay \(\alpha(k) \sim k^{-p(1+\eta)}\) (\(\eta > 0\)).

4.2 Asymptotic Variance of Strongly Mixing Processes and the CLT

When applying the Central Limit Theorem, the limit of the variance of the sample mean is a core quantity.

Lemma 4.9 (Asymptotic Variance of Strongly Mixing Processes)

Let \(\{X_t\}_{t \in \mathbb{Z}}\) be a zero-mean, real-valued weakly stationary process. Assume there exists \(r > 2\) such that:

\[ \sup_{t \in \mathbb{Z}} E|X_t|^r < \infty, \quad \sum_{k \ge 1} \alpha(k)^{1 - \frac{2}{r}} < +\infty \]

Then, the series \(\sum_{k \in \mathbb{Z}} \gamma(k)\) converges absolutely to a non-negative constant \(\sigma^2\). Moreover, the variance of the partial sum \(S_n = \sum_{t=1}^n X_t\) satisfies:

\[ \lim_{n \rightarrow \infty} n Var\left(\frac{S_n}{n}\right) = \sigma^2 \]

Proof of Lemma 4.9 (Click to expand)

First, we use Davydov's Inequality (Lemma 4.7) to study the absolute convergence of the series \(\sum_{k \in \mathbb{Z}} \gamma(k)\). Taking \(q = r\) in Lemma 4.7, according to the parameter relationship \(\frac{1}{q} + \frac{1}{r} = 1 - \frac{1}{p}\), we solve for \(p\):

\[ \frac{1}{p} = 1 - \frac{2}{r} \implies p = \frac{r}{r-2} \]

Substituting this into Davydov's Inequality, we obtain the bound for the covariance \(\gamma(k) = Cov(X_0, X_k)\):

\[ |\gamma(k)| \le \frac{2r}{r-2} (E|X_0|^r)^{2/r} (2\alpha(k))^{1 - 2/r} \]

Since the given condition states that \(\sum_{k \ge 1} \alpha(k)^{1 - 2/r} < +\infty\), by the comparison test, the series \(\sum_{k \in \mathbb{Z}} |\gamma(k)|\) converges absolutely.

Next, we examine the variance of the standardized sum. Since \(\{X_t\}\) is weakly stationary, we can expand its variance:

\[ n Var\left(\frac{S_n}{n}\right) = n^{-1} \sum_{0 \le s, t \le n-1} Cov(X_s, X_t) = \sum_{k=-(n-1)}^{n-1} \left( 1 - \frac{|k|}{n} \right) \gamma(k) \]

Because \(\gamma(k)\) is absolutely summable, according to the Dominated Convergence Theorem (or Kronecker's Lemma), as \(n \rightarrow \infty\), the weight \(\left( 1 - \frac{|k|}{n} \right) \rightarrow 1\). Therefore:

\[ \lim_{n \rightarrow \infty} n Var\left(\frac{S_n}{n}\right) = \sum_{k=-\infty}^{\infty} \gamma(k) = \sigma^2 \ge 0 \]

The non-negativity of the variance guarantees that the limit \(\sigma^2 \ge 0\). The proof is complete. \(\square\)

With the guarantee of the asymptotic variance, we can directly state the Central Limit Theorem for \(\alpha\)-mixing processes:

Theorem 4.13 (CLT for \(\alpha\)-mixing processes)

Let \(\{X_t\}_{t \in \mathbb{Z}}\) be a zero-mean, real-valued strictly stationary process. Assume there exists \(r > 2\) and \(\beta > 0\) such that:

\[ E|X_t|^r < \infty, \quad \alpha(k) \le a k^{-\beta} \]

Where the constant \(a > 0\) and the decay order \(\beta > r / (r - 2)\). If the long-run variance \(\sigma^2 = \sum_{k=-\infty}^{\infty} \gamma(k) > 0\), then we have:

\[ \frac{S_n}{\sigma \sqrt{n}} \xrightarrow{d} N(0, 1) \]

4.3 Coupling Method & Exponential Inequalities

To prove complex mixing limit theorems, statisticians invented the Coupling Method. Its core idea is constructively replacing dependent stationary random sequences with independent sequences that have the same distribution, thereby allowing the use of established conclusions for independent sequences.

Lemma 4.10 (Bradley's Lemma / Coupling Lemma)

Let \((X, Y)\) be a random vector taking values in \(\mathbb{R}^d \times \mathbb{R}\), and for some \(p \in [1, \infty)\) let \(Y \in L^p(P)\). Let \(c\) be a real number such that \(\|Y+c\|_p > 0\), and \(\xi \in (0, \|Y+c\|_p]\). Then, there exists an auxiliary random variable \(Y^*\) satisfying:

\(P_{Y^*} = P_Y\) (i.e., \(Y^*\) and \(Y\) have the same distribution), and \(Y^*\) is independent of \(X\).
The distance between them is controlled by the mixing coefficient:

\[ P(|Y - Y^*| > \xi) \le 11 \left( \xi^{-1} \|Y+c\|_p \right)^{\frac{p}{2p+1}} \{ \alpha(\sigma(X), \sigma(Y)) \}^{\frac{2p}{2p+1}} \]

Utilizing Bradley's Lemma, we can extend classical exponential inequalities for independent sequences (such as Hoeffding's and Bernstein's inequalities) to mixing sequences.

Review: Classical Inequalities for Independent Sequences (Theorem 4.11) Let \(X_1, \dots, X_n\) be independent zero-mean random variables, and \(S_n = \sum X_i\).

Hoeffding's Inequality: If \(a_i \le X_i \le b_i\), then \(P(|S_n| \ge t) \le 2 \exp\left\{ -\frac{2t^2}{\sum (b_i - a_i)^2} \right\}\).
Bernstein's Inequality: If Cramér's condition \(E|X_i|^p \le c^{p-2} p! EX_i^2 < \infty\) is satisfied, then \(P(|S_n| \ge t) \le 2 \exp\left\{ -\frac{2t^2}{4\sum EX_i^2 + 2ct} \right\}\).

Extension: Exponential Inequalities for Mixing Sequences (Theorem 4.12, Bosq 1998) For a zero-mean real-valued process \((X_t)\), if it is uniformly bounded \(\sup_t \|X_t\|_\infty \le b\). Then for integer \(q \in [1, n/2]\) and \(\epsilon > 0\):

\[ P(|S_n| \ge n\epsilon) \le 4 \exp\left(-\frac{\epsilon^2}{8b^2} q\right) + 22 \left(1 + \frac{4b}{\epsilon}\right)^{1/2} q \alpha\left(\left[\frac{n}{2q}\right]\right) \]

(Note: The first term of this inequality is similar to the exponential decay of independent sequences, while the second term is a penalty introduced by the dependence, controlled by the \(\alpha\)-mixing coefficient.)

5. Spectral Method for Estimating Long-Run Covariance \(\sigma^2\)

Previously, we defined the long-run covariance in the CLT as \(\sigma^2 = \sum_{k=-\infty}^{\infty} \gamma(k)\). In actual data, we need a consistent estimator for it. The frequency domain method (spectral analysis) provides an extremely elegant perspective for this.

Definition & Theorem 4.14: Spectral Density Function

Define the spectral density function \(f(\lambda)\) of a time series \(\{X_t\}\) as the Fourier Transform of the autocovariance function \(\gamma(k)\): For frequency \(\lambda \in (-\pi, \pi)\):

\[ f(\lambda) = \frac{1}{2\pi} \sum_{k=-\infty}^{\infty} \gamma(k) \exp(-ik\lambda) \]

If \(\sum |\gamma(k)| < \infty\), the process has a spectral density \(f(\lambda)\). Particularly, at frequency \(\lambda = 0\):

\[ \sum_{k=-\infty}^{\infty} \gamma(k) = 2\pi f(0) \]

Core Idea: The problem of estimating the long-run variance \(\sigma^2\) is equivalent to estimating the value of the spectral density at zero frequency, \(2\pi f(0)\).

5.1 Periodograms

To estimate the spectral density, we introduce the concept of the periodogram. Given a sample \(\{X_1, \dots, X_n\}\), at its Fourier frequencies \(\omega_j = 2\pi j / n \in [-\pi, \pi]\), the periodogram is defined as:

\[ l_n(\omega_j) = \frac{1}{n} \left| \sum_{t=1}^n X_t e^{-it\omega_j} \right|^2, \quad j \in T = \{0, \pm 1, \dots, \pm [n/2]\} \]

It can be equivalently expanded as the Fourier transform of the sample autocovariance function \(\hat{\gamma}(k)\):

\[ \begin{cases} l_n(0) = n|\overline{X}|^2 \\ l_n(\omega_j) = \sum_{|k|<n} \hat{\gamma}(k) e^{-ik\omega_j} & \text{if } \omega_j \ne 0 \end{cases} \]

Where \(\hat{\gamma}(k) = n^{-1} \sum_{t=1}^{n-|k|} (X_t - \overline{X})(X_{t+|k|} - \overline{X})\).

To analyze it on a continuous frequency domain, we define the Extended Periodogram \(I_n(\omega)\): For any \(\omega \in [-\pi, \pi]\), \(I_n(\omega)\) is defined as the value of \(l_n(\omega_k)\) at the Fourier frequency closest to \(\omega\) (i.e., a step function).

Proposition 4.16 (Expected Properties of Periodograms)

If \(\{X_t\}\) is a stationary sequence with mean \(\mu\) and absolutely summable autocovariances, then:

\[ E(I_n(0)) - n\mu^2 \rightarrow 2\pi f(0) \]

For non-zero frequencies \(\omega \ne 0\):

\[ E(l_n(\omega)) \rightarrow 2\pi f(\omega) \]

Particularly, if the true mean \(\mu = 0\), then \(E(I_n(\omega))\) uniformly converges to \(2\pi f(\omega)\) on \([-\pi, \pi]\).

Proof of Proposition 4.16 (Click to expand)

First, for the zero frequency \(\omega = 0\):

\[ E(l_n(0)) - n\mu^2 = n E(\overline{X}^2) - n\mu^2 = n Var(\overline{X}) = n Var(S_n/n) \]

From the conclusion of Lemma 4.9, as \(n \rightarrow \infty\), the above expression converges to \(\sum_{k=-\infty}^{\infty} \gamma(k) = 2\pi f(0)\).

Now consider \(\omega \in (0, \pi]\). Using the equivalent representation of \(l_n(\omega)\) to expand its expectation:

\[ E(l_n(\omega)) = \sum_{|k|<n} \left( 1 - \frac{|k|}{n} \right) \gamma(k) e^{-ik g(n,\omega)} \]

Where \(g(n,\omega)\) is the Fourier frequency closest to \(\omega\). Since the autocovariance \(\gamma(\cdot)\) is absolutely summable, the sequence \(\sum_{|k|<n} \left( 1 - \frac{|k|}{n} \right) \gamma(k) e^{-ik\lambda}\) uniformly converges to its Fourier transform \(2\pi f(\lambda)\).

Moreover, since \(g(n,\omega) \rightarrow \omega\) as \(n \to \infty\), we have:

\[ E(l_n(\omega)) \rightarrow 2\pi f(\omega) \]

If \(\mu=0\), utilizing the uniform continuity of \(f\) on the closed interval \([-\pi, \pi]\), we can obtain the uniform convergence conclusion. The proof is complete. \(\square\)

5.2 Asymptotic Distribution of the Periodogram for Linear Processes

Theorem 4.17

Let \(\{X_t\}\) be a linear process \(X_t = \sum \psi_j \epsilon_{t-j}\) with \(\sum |\psi_j| < \infty\) and \(\epsilon_t \sim i.i.d.\ F(0, \sigma^2)\). If the spectral density \(f(\lambda) > 0\), then for \(m\) distinct frequencies \(0 < \lambda_1 < \dots < \lambda_m < \pi\), the random vector:

\[ (I_n(\lambda_1), \dots, I_n(\lambda_m))^T \]

converges in distribution to a vector composed of independent Exponential Distribution random variables, where the \(i\)-th component has a mean of \(2\pi f(\lambda_i)\).

(Note: This theorem not only provides the limit distribution but also indicates that the periodograms at different frequencies are asymptotically uncorrelated, which lays the foundation for subsequent frequency-domain regression.)

6. Nonparametric Kernel Smoothing Estimation of Spectral Density

Theorem 4.17 reveals a serious issue: A single periodogram \(I_n(\omega)\) is not a consistent estimator of the spectral density \(2\pi f(\omega)\). (Because its limit is an exponential distribution with non-zero variance, rather than degenerating into a constant). Moreover, when \(\mu \ne 0\), \(I_n(0)\) is even biased.

Solution: Since the periodograms at different frequencies \(\omega_j\) are asymptotically independent and their expectations fluctuate around the true spectral density \(f(\omega)\), we can perform a locally weighted averaging of the periodograms at adjacent frequencies. This is the core idea of Nonparametric Kernel Regression.

6.1 Constructing the Log-Periodogram Regression Model

We can write the relationship of the periodograms as a multiplicative model:

\[ I_n(\omega_j) = 2\pi f(\omega_j) e_j + R_j \quad \text{for } j \in T \setminus \{0\} \]

Where \(\{e_j\}\) are independent \(Exp(1)\) random variables, and \(\{R_j\}\) are higher-order negligible terms. Taking the natural logarithm on both sides transforms it into a standard additive nonparametric regression model:

\[ \log\left(\frac{l_n(\omega_j)}{2\pi}\right) = \log(f(\omega_j)) + \log(e_j) \]

Knowing the properties of the logarithm of an exponential distribution: \(E(\log(e_j)) = -0.57721\) (Euler's constant) and \(Var(\log(e_j)) = \pi^2/6\). By centering, let:

\(\eta_j = \log(e_j) + 0.57721\) (zero-mean, i.i.d. error terms with variance \(\pi^2/6\))
\(W_j = \log\left(\frac{l_n(\omega_j)}{2\pi}\right) + 0.57721\) (the new response variable)
\(m(\omega) = \log(f(\omega))\) (the unknown target smoothing function)

Thus, we obtain a fixed-design nonparametric regression model:

\[ W_j = m(\omega_j) + \eta_j, \quad j \in T \setminus \{0\} \]

6.2 Nadaraya-Watson (NW) Kernel Estimator

Now our goal is to estimate \(m(0) = \log(f(0))\) through kernel smoothing. Given a kernel function \(K(\cdot)\) and a smoothing bandwidth \(b\), the NW estimator for \(m(\omega)\) is defined as:

\[ \hat{m}_b(\omega) = \frac{\sum_{j \in T} K\left(\frac{\omega - \omega_j}{b}\right) W_j}{\sum_{j \in T} K\left(\frac{\omega - \omega_j}{b}\right)} \]

Where the bandwidth must satisfy the asymptotic conditions: as \(n \rightarrow \infty\), \(b \rightarrow 0\) and \(nb \rightarrow \infty\).

Finally, through an inverse exponential transformation, we can obtain a consistent estimator for the long-run covariance (the spectral density at zero frequency):

\[ \hat{f}(0) = \exp(\hat{m}_b(0)) \]

With \(\hat{f}(0)\), we successfully acquire the long-run variance \(\sigma^2 = 2\pi \hat{f}(0)\) required for the CLT standardization, thereby closing the loop on the entire asymptotic inference framework for weakly dependent data.