Metropolis Algorithm
Recommended Prerequesites
- Probability
- Probability II
- Sampling
- Introduction to Markov Chains
- Markov Chain Monte Carlo
The Metropolis Algorithm is one of the most common MCMC methods. Its primary purpose is to generate a sequence of samples from a target distribution \(\pi(x)\),
especially when \(\pi(x)\) is known only up to a normalizing constant or is too complex for direct sampling.
Conceptual Framework
Markov Chain Construction
The algorithm constructs a Markov Chain whose stationary distribution is the desired target distribution \(\pi(x)\).
Proposal Mechanism
At each iteration, a new candidate state \(x^*\) is proposed based on the current state \(x_t\), using a proposal distribution \(q(x^*|x_t)\).
Acceptance Criterion
The proposed state \(x^*\) is accepted or rejected based on an acceptance probability \(\alpha\), which ensures that the chain has the correct stationary distribution.
Detailed Balance Condition
The Metropolis algorithm satisfies the detailed balance condition, which is a sufficient condition for the chain to have \(\pi(x)\) as its stationary distribution
$$\pi(x_{t})q(x^*|x_t)\alpha(x_t,x^*)=\pi(x^*)q(x_t|x^*)\alpha(x^*,x_t)$$
Symmetric Proposal Distribution
In the original Metropolis algorithm, the proposal distribution \(q(x^*|x_t)\) is symmetric:
$$q(x^*|x_t)=q(x_t|x^*)$$
This symmetry simplifies the acceptance probability to:
$$\alpha=\min\left(1,\frac{\pi(x^*)}{\pi(x_t)}\right)$$
Random Walk Behavior
The algorithm often employs a random walk proposal, where the next state is proposed by adding a random perturbation to the current state.
Example
Suppose we want to sample from a univariate target distribution \(\pi(x)\) defined on \(\mathbb{R}\). The target distribution is proportional to:
$$\pi(x)\propto e^{-(x^4-16x^2+5x)}$$
This distribution is complex and does not have a standard form, making direct sampling challenging.
Define Target Distribution
We define the unnormalized target density function:
$$\pi(x)\propto e^{-(x^4-16x^2+5x)}$$
The normalizing constant isn't needed for the Metropolis algorithm.
Choose the Proposal Distribution
Select a proposal distribution \(q(x'|x)\), which is the distribution of proposing a new state \(x'\) given the current state \(x\).
We'll use a symmetric normal proposal distribution centered at the current state:
$$q(x'|x)=\mathcal{N}(x';x,\sigma^2)$$
where \(\sigma^2\) is the variance of the proposal distribution.
Example: Set \(\sigma=2\).
Initialize the Chain
Choose an initial value \(x_0\) for the Markov Chain.
$$x_0=0$$
Metropolis Algorithm Iteration
For \(t=0\) to \(T-1\), perform the following steps to generate \(x_{t+1}\):
Generate a Proposal
Sample a candidate value \(x^*\) from the proposal distribution:
$$x^*\sim q(x^*|x_t)=\mathcal{N}(x^*;x_t,\sigma^2)$$
Example Calculation:
- Suppose at iteration \(t\), the current state is \(x_t=0\)
- Sample \(x^*\) from \(\mathcal{N}(0,4)\) and obtain \(x^*=1.5\)
Compute the Acceptance Probability
Since the proposal distribution is symmetric, \(q(x^*|x_t)=q(x_t|x^*)\), so the acceptance probability simplifies to:
$$\alpha=\min\left(1,\frac{\pi(x^*)}{\pi(x_t)}\right)$$
Compute the ratio of target densities:
- Compute \(\pi(x^*)\):
$$\pi(x^*)=e^{-(x^{*4}-16x^{*2}+5x^*)}$$
- Compute \(\pi(x_t)\) similarly
Example:
$$\pi(1.5)=e^{-(1.5^4-16\times 1.5^2+5\times 1.5)}=e^{28.3125}$$
$$\pi(0)=e^{-(0^4-16\times0^2+5\times0)}=e^{0}=1$$
Since this ratio is greater than 1, the acceptance probability \(\alpha=1\).
Accept or Reject the Proposal
- Generate a uniform random number \(u\sim \mathcal{U}(0,1)\)
- If \(u \leq \alpha\), accept the proposal: \(x_{t+1}=x^*\)
- Else, reject the proposal: \(x_{t+1}=x_t\)
Example:
- Generate \(u=0.7\)
- Since \(u=0.7\leq\alpha=1\), accept the proposal.
- Set \(x_{t+1}=1.5\)
Record the State
Continue the process for a large number of iterations to obtain a sequence of samples \(\{x_t\}_{t=1}^T\).
Post-Processing
- Burn-in Period: Discard the initial \(B\) samples (e.g., first 1,000 iterations) to reduce the influence of the starting value.
- Thinning (optional): To reduce autocorrelation, keep every \(k\)-th sample
Metropolis-Hastings
Introduction
While the original Metropolis algorithm uses a symmetric proposal distribution (the probability of proposing a move from state \(x\) to \(y\) is the same as from \(y\) to \(x\)),
The Metropolis-Hastings algorithm relaxes this requirement. It introduces an acceptance probability that corrects for the asymmetry, ensuring that the Markov chain still converges to the target distribution.
Algorithm
Acceptance Probability
The acceptance probability (\alpha(x,x')\) is defined as:
$$\alpha(x,x')=\min\left(1,\frac{\pi(x')q(x|x')}{\pi(x)q(x'|x)}\right)$$
Initialization
Start with an initial state \(x_0\).
iteration
For \(t=0\) to \(N-1\):
- Proposal: Generate a proposed state \(x'\) from the proposal distribution (q(x'|x_t)\).
- Acceptance Probability: Compute \(\alpha(x_t,x')\)
- Acceptance/Rejection:
- Generate a uniform random number \(u\sim\mathcal{U}(0,1)\)
- If \(u\leq\alpha(x_t,x')\), accept the proposal and set \(x_{t+1}=x'\)
- Else, reject the proposal and set \(x_{t+1}=x_t\)
Repeat
Repeat: Continue the process to generate a sequence of \(\{x_t\}\)
Metropolis-Hastings Detailed Balance
$$\pi(x)P(x\rightarrow x')=\pi(x')P(x'\rightarrow x)$$
Example
Suppose we want to sample from the target distribution:
$$\pi(x)=\frac{1}{Z}e^{-\frac{x^2}{2}},\quad x\in\mathbb{R}$$
Since this is a standard normal distribution, we know the value of \(Z\), but normalization constants aren't important for this algorithm.
Proposal Distribution
We will choose a normal distribution centered at the current state \(x_t\):
$$q(x'|x_t)=\mathcal{N}(x';x_t,\sigma^2)$$
where \(\sigma^2\) is the variance of the proposal distribution.
Algorithm Steps
Initialization
Let \(x_0=0\).
Iteration
For \(t=0\) to \(N-1\):
Proposal
Sample: \(x'\sim\mathcal{N}(x_t,\sigma^2)\)
Acceptance Probability
Since \(q(x'|x_t)=q(x_t|x')\) (symmetric proposal), the acceptance probability simplifies to:
$$\alpha(x_t,x')=\min\left(1,\frac{\pi(x')}{\pi(x_t)}\right)$$
Since \(pi(x)=e^{-\frac{x^2}{2}}\) (ignoring the constant \(Z\)), we have:
$$\alpha(x_t,x')=\min\left(e^{-\frac{x'^2-x_{t}^2}{2}}\right)$$
Acceptance/Rejection
- Generate \(u\sim\mathcal{U}(0,1)\)
- If \(u\leq\alpha(x_t,x')\), accept \(x'\) and set \(x_{t+1}=x'\)
- Else, set \(x_{t+1}=x_t\)
Repeat
Continue for \(N\) iterations.
Considerations
Choice of Proposal Distribution
A small variance leads to high acceptance rates but slow exploration. A large variance leads to low acceptance rates but can explore the space more quickly.