Distribution Textbook (Work in Progress)

by John Della Rosa

Estimators

Introduction to Estimators

Recommended Prerequesites

  1. Probability
  2. Bounds and Limits

Introduction

An estimator is a function or rule that takes a sample of data and produces an estimate of some population parameter.

Let \(\theta\) be a parameter of interest (e.g., the population mean, variance, or proportion), and let \(X_1,X_2,\dots,X_n\) be a random sample from a population. An estimator \(\hat{\theta}\) is a function of the sample: $$\hat{\theta}=\hat{\theta}(X_1,X_2,\dots,X_n)$$ The estimate \(\hat{\theta}\) is a specific value obtained by applying the estimator to a given data set

Example

The sample mean, \(\bar{X}\), is an estimator for the population mean \(\mu\): $$\bar{X}=\frac{1}{n}\sum_{i=1}^{n}X_i$$

Bias

The bias of an estimator measures how far the expectedvalue of the estimator is from the true parameter value. $$\text{Bias}(\hat{\theta})=\mathbb{E}[\hat{\theta}]-\theta$$

Example

The sample variance \(S^2\) is an unbiased estimator for the population variance \(\sigma^2\). However, the biased sample variance estimator \(\hat{\sigma}^2\): $$\hat{\sigma}^2=\frac{1}{n}\sum_{i=1}^{n}(X_i-\bar{X})^2$$ has bias \(\frac{n-1}{n}\sigma^2\), making it a biased estimator of the population variance.

Asymptotic Unbiasedness

AN estimator is asymptotically unbiased if $$\lim_{n\to\infty}\text{Bias}(\hat{\theta}_n)=0$$

Variance of an Estimator

The variance of an estimator measures the expected deviation of the estimator from its expected value. The variance of \(\hat{\theta}^2\) is given by: $$\text{Var}(\hat{\theta})=\mathbb{E}[(\hat{\theta}-\mathbb{E}[\hat{\theta}])^2]$$

Bias-Variance Trade-off

The mean squared error of an estimator can be broken down into the two previously mentioned quantities: $$\text{MSE}(\hat{\theta})=\mathbb{E}[(\hat{\theta}-\theta)^2]=\text{Var}(\hat{\theta})+[\text{Bias}(\hat{\theta})]^2$$

Consistency

Formally stated, consistency is that the probability that the bias deviates from the true parameter by a given amount goes to 0 as the sample sizes goes to infinity: $$\lim_{n\rightarrow\infty}P(|\hat{\theta}_n-\theta|\geq\varepsilon)=0$$ This means that the probability that \(\hat{\theta}_n\) deviates from \(\theta\) by more than \(\varepsilon\) approaches zero as n increases. The previously mentioned biased sample variance estimator is consistent.

Weak Consistency

Convergence in probability to \(\theta\)

Strong Consistency

Almost sure convergence to \(\theta\).

Sufficient Statistics

An estimator is sufficient if it captures all the information in the data relevant to estimating the parameter. Formally, a statistic \(T(X_1,X_2,\dots,X_n)\) is sufficient for \(\theta\) if the conditional distribution of the sample given \(T\) does not depend on \(\theta\).

Estimation methods

Method of Moments

The Method of Moments (MoM) is an intuitive and relatively simple technique for parameter estimation. It relies on matching the theoretical moments (e.g., mean, variance) of a distribution with the corresponding sample moments.

Maximum Likelihood Estimation

MLE estimates by trying to find the parameter values that maximize the likelihood function, which represents the probability of observing the given sample.

Cramer-Rao Lower Bound

The Cramer-Rao Lower Bound (CRLB) provides a lower bound on the variance of unbiased estimators: $$\text{Var}(\hat{\theta}_n)\geq\frac{1}{n\mathcal{I}(\theta)}$$ where \(\mathcal{I}(\theta)\) is the Fisher Information: $$\mathcal{I}(\theta)=\mathbb{E}\left[(\frac{\partial}{\partial\theta}\log f(X;\theta))^2\right]$$ An estimator that achieves the CRLB is said to be efficient.

Estimators Practice Problems