Distribution Textbook (Work in Progress)

by John Della Rosa

Confidence Intervals for Parameter Estimates

Introduction

Recommended Prerequesites

  1. Probability
  2. Statistics
  3. MLE
  4. Method of Moments

Introduction

In statistical inference, a single point estimate, such as the maximum likelihood estimate (MLE), provides a useful summary of a parameter's likely value. However, it does not provide information about the uncertainty associated with the estimate. Confidence intervals (CIs) offer a solution to this by giving a range of values that are likely to contain the true parameter with a specified level of confidence.

General Definition

A confidence interval for parameter \(\theta\) is an interval \([L(X),U(X)]\) that is constructed from the data X, such that: $$P(L(X)\leq \theta \leq U(X))=1-\alpha$$ where \(1-\alpha\) is the confidence level, commonly 95% (\(\alpha=0.05\)). This means that if we repeated the estimation process many times, approximately 95% of the intervals constructed from different samples would contain the true parameter value.

Wald Confidence Interval

The Wald Confidence Interval is one of the most widely used methods for constructing confidence intervals for parameter estimates. It is based on the assumption that, for large sample sizes, the sampling distribution of the estimator is approximately normal. This makes the Wald interval particularly useful in the context of maximum likelihood estimation and the method of moments, where such normality often holds due to the Central Limit Theorem.

Suppose we have an estimator \(\hat{\theta}\) for a parameter \(\theta\), and we are interested in constructing a confidence interval for \(\theta\) based on the distribution of \(\hat{\theta}\).
For large samples, under regularity conditions, the estimator \(\hat{\theta}\) is approximately normally distributed: $$\hat{\theta}\sim N(\theta,\frac{\sigma^2}{n})$$ where \(\hat{\theta}\) is the point estimate of \(\theta\) from the data, \(\sigma^2\) is the variance of the estimator, and n is the sample size. Since the estimator is (assumed to be) normally distributed, it is easy to construct confidence intervals based on common z-score numbers.
By the CLT: $$P(-z_{\alpha/2}\leq \frac{\hat{\theta}-\theta}{SE(\hat{\theta})}\leq z_{\alpha/2})=1-\alpha$$

Thus, the \((1-\alpha)\times 100%\) Wald confidence interval is given by: $$\hat{\theta}\pm z_{\alpha/2}\cdot SE(\hat{\theta})$$

Assumptions

Large Sample Size

The Wald interval relies on the Central Limit Theorem for the estimator to have a normal distribution.

Estimator is Unbiased or Asymptotically Unbiased

Finite and Non-Zero Variance

Likelihood Ratio Confidence Interval

Introduction to Likelihood Ratio Confidence Interval

Recall the log-likelihood function $$\ell(\theta;X)=\log L(\theta;X)$$ For a given parameter \(\Theta\), the likliehood ratio test compares the likelihood of the parameter estimate \(\hat{\theta}\) with the likelihood of some other value \(\theta\). The likelihood ratio statistic is defined as: $$\lambda(\theta)=\frac{L(\hat{\theta};X)}{L(\theta;X)}$$ Taking the logarithm of this ratio gives the log-likelihood ratio statistic: $$\Lambda(\theta)=2(\ell(\hat{\theta};X)-\ell(\theta;X))$$ There is a factor of 2 included, which may be unclear where it comes from. The log-likelihood ratio statistic is actually derived through a longer process. This involves a Taylor expansion, leading to a squared difference term with a coefficient of 0.5. This derivation shows that $$\Lambda(\theta)\sim\chi_1^2$$ The likelihood ratio confidence interval for a given confidence level \(1-\alpha\) is the set of \(\theta\) whose log likelihood statistic is within the critical value of the chi-square distribution with 1 degree of freedom at confidence level \(1-\alpha\). $$\left\{\theta:2(\ell(\hat{\theta})-\ell(\theta))\leq \chi_{1-\alpha,1}^2\right\}$$

Use of Likelihood Ratio Confidence Intervals

Advantages

There is no normality assumption, the intervals can be asymmetric about the estimate

Disadvantages

The computation of the interval is more difficult than that of Wald. This also requires some assumptions about the likelihood function, such as differentiability.

Score Confidence Intervals

Recall that the score function, \(U(\theta)\) is defined as the following: $$U(\theta)=\frac{\partial \ell(\theta)}{\partial \theta}$$ Thus, at the maximum likelihood estimate, \(\theta=\hat{\theta}), the score function equals 0: $$U(\hat{\theta})=0$$ Consequently, confidence intervals can also be derived from the score function. The score confidence interval is based on the distribution of the score function and uses the fact that the score function behaves asymptotically like a normal random variable under regularity conditions.

Estimation

The score confidence interval is derived by inverting a score test for hypothesis testing. This involves testing whether a specific parameter value \(\theta_0\) is plausible given the data, using the score function. The hypothesis test is: $$H_0:\theta=\theta_0$$ The score test statistic is: $$S(\theta_0)=\frac{U(\theta_0)}{\sqrt{I(\theta_0)}}$$ The score statistic is approximately distributed: $$S(\theta_0)\sim N(0,1)$$ Since it is normally distributed, we can establish confidence intervals easily: $$P\left(-z_{\alpha/2}\leq \frac{U(\theta_0)}{\sqrt{I(\theta_0)}}\leq z_{\alpha/2}\right)=1-\alpha$$ This gives us the equation: $$U(\theta_0)\pm z_{\alpha/2}\cdot\sqrt{I(\theta_0)}=0$$ To get the interval, we must solve for the values of \(\theta_0\). Often this must be done numerically rather than analytically. One notable distribution for which it can be done analytically is the binomial distribution with a known n.

Properties

The confidence interval can be asymmetric about the estimated value.

Credible Interval

Credible Intervals are the Bayesian equivalent of confidence intervals. We will discuss this more in the Bayesian inference section, but here is a brief primer:

A credible interval is an interval \([a,b]\) such that the posterior probability that the parameter 𝜃 θ lies within this interval is equal to a specified value \(1-\alpha\). $$P(a\leq \theta \leq b|\text{data})=1-\alpha$$ $$P(\theta|X)=\frac{P(X|\theta)P(\theta)}{P(X)}$$ The equal-tailed credible interval has a,b such that $$P(\theta \lt a|X)=P(\theta\gt b)=\frac{\alpha}{2}$$ An alternative is Highest Posterior Density interval, which does not require equal mass on each side of the interval. It is defined as the interval that incldues all values of \(\theta\) for which the posterior density is higher than any point outside the interval while containing mass \(1-\alpha\).

Confidence Interval Practice Problems

  1. Derive the score confidence interval for p for a binomial distribution with an unknown parameter p and a known, generic parameter n.