Distribution Textbook (Work in Progress)

by John Della Rosa

Metropolis Algorithm

Recommended Prerequesites

  1. Probability
  2. Probability II
  3. Sampling
  4. Introduction to Markov Chains
  5. Markov Chain Monte Carlo

Metropolis Algorithm

The Metropolis Algorithm is one of the most common MCMC methods. Its primary purpose is to generate a sequence of samples from a target distribution \(\pi(x)\), especially when \(\pi(x)\) is known only up to a normalizing constant or is too complex for direct sampling.

Conceptual Framework

Markov Chain Construction

The algorithm constructs a Markov Chain whose stationary distribution is the desired target distribution \(\pi(x)\).

Proposal Mechanism

At each iteration, a new candidate state \(x^*\) is proposed based on the current state \(x_t\), using a proposal distribution \(q(x^*|x_t)\).

Acceptance Criterion

The proposed state \(x^*\) is accepted or rejected based on an acceptance probability \(\alpha\), which ensures that the chain has the correct stationary distribution.

Detailed Balance Condition

The Metropolis algorithm satisfies the detailed balance condition, which is a sufficient condition for the chain to have \(\pi(x)\) as its stationary distribution $$\pi(x_{t})q(x^*|x_t)\alpha(x_t,x^*)=\pi(x^*)q(x_t|x^*)\alpha(x^*,x_t)$$

Symmetric Proposal Distribution

In the original Metropolis algorithm, the proposal distribution \(q(x^*|x_t)\) is symmetric: $$q(x^*|x_t)=q(x_t|x^*)$$ This symmetry simplifies the acceptance probability to: $$\alpha=\min\left(1,\frac{\pi(x^*)}{\pi(x_t)}\right)$$

Random Walk Behavior

The algorithm often employs a random walk proposal, where the next state is proposed by adding a random perturbation to the current state.

Example

Suppose we want to sample from a univariate target distribution \(\pi(x)\) defined on \(\mathbb{R}\). The target distribution is proportional to: $$\pi(x)\propto e^{-(x^4-16x^2+5x)}$$ This distribution is complex and does not have a standard form, making direct sampling challenging.

Define Target Distribution

We define the unnormalized target density function: $$\pi(x)\propto e^{-(x^4-16x^2+5x)}$$ The normalizing constant isn't needed for the Metropolis algorithm.

Choose the Proposal Distribution

Select a proposal distribution \(q(x'|x)\), which is the distribution of proposing a new state \(x'\) given the current state \(x\). We'll use a symmetric normal proposal distribution centered at the current state: $$q(x'|x)=\mathcal{N}(x';x,\sigma^2)$$ where \(\sigma^2\) is the variance of the proposal distribution. Example: Set \(\sigma=2\).

Initialize the Chain

Choose an initial value \(x_0\) for the Markov Chain. $$x_0=0$$

Metropolis Algorithm Iteration

For \(t=0\) to \(T-1\), perform the following steps to generate \(x_{t+1}\):
Generate a Proposal
Sample a candidate value \(x^*\) from the proposal distribution: $$x^*\sim q(x^*|x_t)=\mathcal{N}(x^*;x_t,\sigma^2)$$ Example Calculation:
Compute the Acceptance Probability
Since the proposal distribution is symmetric, \(q(x^*|x_t)=q(x_t|x^*)\), so the acceptance probability simplifies to: $$\alpha=\min\left(1,\frac{\pi(x^*)}{\pi(x_t)}\right)$$ Compute the ratio of target densities:
  1. Compute \(\pi(x^*)\): $$\pi(x^*)=e^{-(x^{*4}-16x^{*2}+5x^*)}$$
  2. Compute \(\pi(x_t)\) similarly
Example: $$\pi(1.5)=e^{-(1.5^4-16\times 1.5^2+5\times 1.5)}=e^{28.3125}$$ $$\pi(0)=e^{-(0^4-16\times0^2+5\times0)}=e^{0}=1$$ Since this ratio is greater than 1, the acceptance probability \(\alpha=1\).
Accept or Reject the Proposal
Example:
Record the State
Continue the process for a large number of iterations to obtain a sequence of samples \(\{x_t\}_{t=1}^T\).
Post-Processing

Metropolis-Hastings

Introduction

While the original Metropolis algorithm uses a symmetric proposal distribution (the probability of proposing a move from state \(x\) to \(y\) is the same as from \(y\) to \(x\)), The Metropolis-Hastings algorithm relaxes this requirement. It introduces an acceptance probability that corrects for the asymmetry, ensuring that the Markov chain still converges to the target distribution.

Algorithm

Acceptance Probability

The acceptance probability (\alpha(x,x')\) is defined as: $$\alpha(x,x')=\min\left(1,\frac{\pi(x')q(x|x')}{\pi(x)q(x'|x)}\right)$$

Initialization

Start with an initial state \(x_0\).

iteration

For \(t=0\) to \(N-1\):
  1. Proposal: Generate a proposed state \(x'\) from the proposal distribution (q(x'|x_t)\).
  2. Acceptance Probability: Compute \(\alpha(x_t,x')\)
  3. Acceptance/Rejection:
    1. Generate a uniform random number \(u\sim\mathcal{U}(0,1)\)
    2. If \(u\leq\alpha(x_t,x')\), accept the proposal and set \(x_{t+1}=x'\)
    3. Else, reject the proposal and set \(x_{t+1}=x_t\)

Repeat

Repeat: Continue the process to generate a sequence of \(\{x_t\}\)

Metropolis-Hastings Detailed Balance

$$\pi(x)P(x\rightarrow x')=\pi(x')P(x'\rightarrow x)$$

Example

Suppose we want to sample from the target distribution: $$\pi(x)=\frac{1}{Z}e^{-\frac{x^2}{2}},\quad x\in\mathbb{R}$$ Since this is a standard normal distribution, we know the value of \(Z\), but normalization constants aren't important for this algorithm.

Proposal Distribution

We will choose a normal distribution centered at the current state \(x_t\): $$q(x'|x_t)=\mathcal{N}(x';x_t,\sigma^2)$$ where \(\sigma^2\) is the variance of the proposal distribution.

Algorithm Steps

Initialization
Let \(x_0=0\).
Iteration
For \(t=0\) to \(N-1\):
Proposal
Sample: \(x'\sim\mathcal{N}(x_t,\sigma^2)\)
Acceptance Probability
Since \(q(x'|x_t)=q(x_t|x')\) (symmetric proposal), the acceptance probability simplifies to: $$\alpha(x_t,x')=\min\left(1,\frac{\pi(x')}{\pi(x_t)}\right)$$ Since \(pi(x)=e^{-\frac{x^2}{2}}\) (ignoring the constant \(Z\)), we have: $$\alpha(x_t,x')=\min\left(e^{-\frac{x'^2-x_{t}^2}{2}}\right)$$
Acceptance/Rejection
  1. Generate \(u\sim\mathcal{U}(0,1)\)
  2. If \(u\leq\alpha(x_t,x')\), accept \(x'\) and set \(x_{t+1}=x'\)
  3. Else, set \(x_{t+1}=x_t\)
Repeat
Continue for \(N\) iterations.

Considerations

Choice of Proposal Distribution

A small variance leads to high acceptance rates but slow exploration. A large variance leads to low acceptance rates but can explore the space more quickly.

Metropolis Algorithm Practice Problems