Goodness of Fit
Introduction
In prior chapters, we have looked at goodness of fit, moreso in the context of using it to fit parameters; e.g., likelihood and MLE. This chapter will focus more on it for its own sake rather than a means to an end. This will be written more from an applied, modeling perspective rather than a theoretical one.
Recommended Prerequesites
- Probability
- Probability II
- Empirical Distributions
Empirical Proccess
In the
Empirical Distribution Chapter, we covered some goodness-of-fit metrics that come from the empirical process, specifically Kolmogorov-Smirnov, Cramer-von Mises, and Anderson-Darling.
These are used to assess closeness of distributional fit. For more information, see that chapter.
Scoring Rules
Description
Scoring rules are used to evaluate how well probabilistic forecasts capture the uncertainty of the predictions. A proper scoring rule is one that is minimized when the forecasted probability distribution matches the true distribution. Scoring rules penalize poor forecasts, rewarding forecasts that are both sharp (concentrated) and well-calibrated (match the observed outcomes).
Examples
Discrete Log Score
The logarithmic score, or log score, is one of the most commonly used proper scoring rules. It evaluates how likely the forecasted distribution considers the actual observed outcome.
For a probability distribution P and observation x, the log score is:
$$\text{Log Score}=\log P(x)$$
The
higher the
discrete log score, the better the forecast. The log score heavily penalizes forecasts that assign very low probabilities to the observed outcome.
Continuous Log Score
The continuous version is very similar, but with the distinction that there is a negative; thus, we want to
minimize it.
$$L(D,y)=-\log(f_D(y))$$
Brier Score
The Brier score can be used to assess the accuracy of probabilistic forecasts in binar classification settings. For a probabilitistic forecast \(p_i\) for even i and the corresponding outcome \(o_i\) which is 1 if an event occured and 0 otherwise, the Brier score is given by:
$$\text{Brier Score}=\frac{1}{n}\sum_{i=1}^{n}(p_i-o_i)^2$$
This is similar to MSE for continuous response variables.
Continuous Ranked Probability Score
The Continuous Ranked Probability Score (CRPS) generalizes the Brier score to continuous outcomes. It measures the distance between the forecasted cumulative distribution function (CDF) \(F\) and the observed value \(x\). The CRPS is defined as:
$$CRPS(F,x)=\int_{-\infty}^{\infty}(F(y)-\mathbb{I}_{y\geq x})^2dy$$
where \(\mathbb{I}\) is the indicator function. Like the Brier score, smaller values of CRPS indicate better forecasts. Since this is a discontinuous function due to the indicator, we can break this integral into two pieces at the point \(x=y\) if evaluating by hand, or just not worry if doing things numerically.
Categories of Metrics
Calibration
Calibration refers to how well the predicted probabilities match the actual outcomes. This is similar to the notion of bias.
Sharpness
Sharpness measures how concentrated or "sharp" the forecasted distribution is. A sharp forecast is one that makes confident predictions, i.e., the distribution is concentrated around certain values. Sharpness does not assess the correctness of the predictions but focuses on the model’s confidence.