Distribution Textbook (Work in Progress)

by John Della Rosa

Markov Chain

Introduction to Markov Chains

Recommended Prerequesites

  1. Probability
  2. Probability II
  3. Sampling
  4. Introduction to Multivariate Distributions

Basic Concepts

Stochastic Process

A stochastic process is a collection of random variables \(\{X_t\}_{t\in T}\).

Markov Property

A stochastic process \(\{X_t\}_{t\geq 0}\) satisfies the Markov property if, for all \(t\geq 0\) and for all states \(x_0, x_1, \dots, x_{t}, x_{t+1}\), $$P(X_{t+1}=x_{t+1}|X_{t}=x_{t},X_{t-1}=x_{t-1},\dots,X_0=x_0)=P(X_{t+1}=x_{t+1}|X_t=x_t)$$ In other words, only the present state affects the probability of the next transition.

Markov Chain

A Markov Chain is a discrete-time stochastic process with the Markov property and a discrete state space \(S\). This state space can be:

Transition Probability

The dynamics of a Markov Chain are characterized by transition probabilities: $$P_{ij}=P(X_{t+1}=j|X_{t}=i),\quad\forall i,j\in S$$ These transition probabilities can be arranged into a transition matrix where: The second and third characteristics imply that \(P_{ij}\leq 1\).

Absorbing State

An absorbing state \(i\) satisfies \(P_{ii}=1\). Once entered, the chain remains in this state indefinitely.

Intermediate Concepts

Chapman-Kolmogorov Equations

The \(n\)-step transition probability is the probability of transitioning from state \(i\) to state \(j\) in \(n\) steps: $$P_{ij}^{(n)}=P(X_{t+n}=j|X_t=i)$$ This leads to the Chapman-Kolmogorov Equations: $$P_{ij}^{(n+m)}=\sum_{k\in S}P_{ik}^{(n)}P_{kj}^{(m)}$$

Stationary Distribution

A stationary distribution \(\pi=\pi_{i\in S}\) satisfies: $$\pi_j=\sum_{i\in S}\pi_i P_{ij},\quad \forall j\in S,$$ or alternatively written: $$\vec{\pi}=\vec{\pi}\mathbf{P}$$ where \(pi\) is a vector of the probability distribution over states and \(P\) is the transition matrix However, this need not be something that is initially true. We care more about convergence.

Convergence

It is said that the distribution of \(X_t\) converges to the stationary distribution \(\pi\) as \(t\to\infty\) if: $$\lim_{n\to\infty}P(X_t=j|X_0=i)=\pi_j,\quad \forall i,j\in S$$

Total Variation Distance

The total variation distance between two probability distributions \(\mu\) and \(\nu\) on \(S\) is: $$||\mu-\nu||_{TV}=\frac{1}{2}\sum_{i\in S}|\mu_i-\mu_i|$$ The mixing time is the time it takes for the Markov Chain to get close to its stationary distribution in total variation distance.

Ergodicity

For a Markov Chain to be ergodic, it must satisfy the following 3 properties:
  1. Irreducible
  2. Aperiodic
  3. Positive Recurrent

Irreducibility

A Markov chain state \(j\) is accessible from state \(i\) (denoted \(i\rightarrow j\)) if there exists \(n\geq 0\) such that \(P_{ij}^{(n)}\gt 0\) where \(P_{ij}^{(n)}\) is the \(n\)-step transition probability. It is said that states \(i\) and \(j\) communicate (denoted \(i\leftrightarrow j\)) if \(i\rightarrow j\) and \(j\rightarrow i\). A Markov Chain is irreducible if every pair of states communicates, meaning that it's possible to reach any state from any other state.

Periodicity

The period of a state \(i\) is: $$d(i)=\text{gcd}\{n\geq 1: P_{ii}^{(n)}\gt 0\}$$ An aperiodic chain is one where all states are aperiodic.

Recurrence and Transience

There are two categories of recurrent states:

Reversible Markov Chains

Detailed Balance Condition

A Markov Chain satisfies the detailed balance condition with respect to a distribution \(\pi\) if: $$\pi_{i}P_{ij}=\pi_{j}P_{ji},\quad\forall i,j\in S$$ This implies that the flow of probability from $i$ to $j$ is exactly balanced by the flow from $j$ to $i$. A chain is reversible if it satisfies thee detailed balance condition.

Limit Theorems for Markov Chains

Law of Large Numbers

For an ergodic Markov Chain, the time average of a function converges to its expectation under the stationary distribution: $$\lim_{n\to\infty}\frac{1}{n}\sum_{t=1}^{n}f(X_t)=\mathbb{E}_{\pi}[f(X)],\quad\text{almost surely}$$

Central Limit Theorem

$$\sqrt{n}\left(\frac{1}{n}\sum_{i=1}^{n}f(X_t)-\mathbb{E}_{\pi}[f(X)]\right)\overset{d}{\rightarrow}\mathcal{N}(0,\sigma^2)$$

Markov Chain Practice Problems