Estimators
Introduction to Estimators
Recommended Prerequesites
- Probability
- Bounds and Limits
Introduction
An estimator is a function or rule that takes a sample of data and produces an estimate of some population parameter.
Let \(\theta\) be a parameter of interest (e.g., the population mean, variance, or proportion), and let
\(X_1,X_2,\dots,X_n\) be a random sample from a population. An estimator \(\hat{\theta}\) is a function of the sample:
$$\hat{\theta}=\hat{\theta}(X_1,X_2,\dots,X_n)$$
The estimate \(\hat{\theta}\) is a specific value obtained by applying the estimator to a given data set
Example
The sample mean, \(\bar{X}\), is an estimator for the population mean \(\mu\):
$$\bar{X}=\frac{1}{n}\sum_{i=1}^{n}X_i$$
Bias
The bias of an estimator measures how far the expectedvalue of the estimator is from the true parameter value.
$$\text{Bias}(\hat{\theta})=\mathbb{E}[\hat{\theta}]-\theta$$
- If \(\mathbb{E}[\hat{\theta}]=\theta\), the estimator is said to be unbiased.
- If \(\mathbb{E}[\hat{\theta}]\neq\theta\), the estimator is biased
Example
The sample variance \(S^2\) is an unbiased estimator for the population variance \(\sigma^2\).
However, the biased sample variance estimator \(\hat{\sigma}^2\):
$$\hat{\sigma}^2=\frac{1}{n}\sum_{i=1}^{n}(X_i-\bar{X})^2$$
has bias \(\frac{n-1}{n}\sigma^2\), making it a biased estimator of the population variance.
Asymptotic Unbiasedness
AN estimator is asymptotically unbiased if
$$\lim_{n\to\infty}\text{Bias}(\hat{\theta}_n)=0$$
Variance of an Estimator
The variance of an estimator measures the expected deviation of the estimator from its expected value. The variance of \(\hat{\theta}^2\) is given by:
$$\text{Var}(\hat{\theta})=\mathbb{E}[(\hat{\theta}-\mathbb{E}[\hat{\theta}])^2]$$
Bias-Variance Trade-off
The mean squared error of an estimator can be broken down into the two previously mentioned quantities:
$$\text{MSE}(\hat{\theta})=\mathbb{E}[(\hat{\theta}-\theta)^2]=\text{Var}(\hat{\theta})+[\text{Bias}(\hat{\theta})]^2$$
Consistency
Formally stated, consistency is that the probability that the bias deviates from the true parameter by a given amount goes to 0 as the sample sizes goes to infinity:
$$\lim_{n\rightarrow\infty}P(|\hat{\theta}_n-\theta|\geq\varepsilon)=0$$
This means that the probability that \(\hat{\theta}_n\) deviates from \(\theta\) by more than \(\varepsilon\) approaches zero as n increases.
The previously mentioned biased sample variance estimator is consistent.
Weak Consistency
Convergence in probability to \(\theta\)
Strong Consistency
Almost sure convergence to \(\theta\).
Sufficient Statistics
An estimator is sufficient if it captures all the information in the data relevant to estimating the parameter. Formally, a statistic
\(T(X_1,X_2,\dots,X_n)\) is sufficient for \(\theta\) if the conditional distribution of the sample given \(T\) does not depend on \(\theta\).
Estimation methods
Method of Moments
The Method of Moments (MoM) is an intuitive and relatively simple technique for parameter estimation. It relies on matching the theoretical moments (e.g., mean, variance) of a distribution with the corresponding sample moments.
Maximum Likelihood Estimation
MLE estimates by trying to find the parameter values that maximize the likelihood function, which represents the probability of observing the given sample.
Cramer-Rao Lower Bound
The Cramer-Rao Lower Bound (CRLB) provides a lower bound on the variance of unbiased estimators:
$$\text{Var}(\hat{\theta}_n)\geq\frac{1}{n\mathcal{I}(\theta)}$$
where \(\mathcal{I}(\theta)\) is the Fisher Information:
$$\mathcal{I}(\theta)=\mathbb{E}\left[(\frac{\partial}{\partial\theta}\log f(X;\theta))^2\right]$$
An estimator that achieves the CRLB is said to be efficient.