Distribution Textbook (Work in Progress)

by John Della Rosa

Extreme Value Theory

Introduction to Extreme Value Theory

Recommended Prerequesites

  1. Probability
  2. Probability II

What is Extreme Value Theory?

Extreme value theory (EVT) deals with understanding the behavior of tail events. One example is understanding the maximum from repeated sampling from a distribution.

Introduction to Order Statistics

Previously, we've been dealing with statistics such as the mean. Order statistics provide a way to analyze the behavior of the extremes (minimum and maximum) as well as the general properties of distributions and their sample representations.

Example

Let \(x_1, x_2,\dots, x_n\) be a sample of n i.i.d random variables drawn from a distribution X. The order statistics of this sample are defined as the sorted values of the sample from smallest to largest: $$x_{(1)}\leq x_{(2)}\leq \dots \leq x_{(n)}$$ Here:

Median

The median is another common descriptor for a sample or distribution. We can think of it in the context of order statistics given what we learned above.

Odd

If n is odd, the median is the \((\frac{n+1}{2})\)-th order statistic

Even

If n is even, the median is defined as the arithmetic mean of the two middle-most order statistics, since there is not a single middle: $$\text{Median}=\frac{x_{(\frac{n}{2})}+x_{(\frac{n}{2}+1)}}{2}$$

Distribution of Order Statistics

One nice thing about deriving the distribution of various order statistics is that they can be described well in plain English.

Distribution of the Maximum

Let us have a fixed number of draws in our sample, n, from a distribution \(F_X\). We will denote the elements in our sample as \(X_{(k)}\) for the k-th order statistic. The probability that all n samples do not exceed a certain number is simply given by $$P(M_n\leq x)=P(X_1\leq x, X_2\leq x, \dots, X_n\leq x)=[F(x)]^n$$ But what is the shape of this distribution? It can be gotten through variable substitution and simple use of the power/chain rule from calculus, but let's intuitively derive it.

The probability that \(X_{(n)}=x\) is not as simple as the probability of getting x when sampling for a given element like when we use our PMF or PDF. Instead we have to consider the relationship as well with what it implies for the other elements of the sample. We must consider the probability of getting \(X_i=x\) for some \(i\in\{1,2,\dots,n\}\) and for it to be the maximum, \(X_{j}\leq x \forall j\neq i\).

Since there are n possible draws to get \(X_i=x\), it makes sense to have n in there.
The probability that for another draw in the series to not exceed that is \(P(X_j\leq x)=F(x)\). We must have this occur n-1 times, so we raise this to the n-1 power. Finally, the probability of a given draw in the sample equalling x is \(P(X_i=x)=f(x)\).
Thus, we finally get the PDF of \(X_{(n)}\) as: $$f_{X_{(n)}}(x)=n\left[F(x)\right]^{n-1}f(x)$$
Interpretation
What does this imply? Well, if x corresponds to a low quantile, then F(x) gets raised to the n-1 power, which greatly shrinks the probability. This intuitively means that as the sample size is larger, the probability that the max of that sample is a small number is very small.

Simulation

10
1000
30

Summary Statistics

Mean:

Median:

Variance:

Standard Deviation:

Distribution of the Minimum

By symmetry, the distribution of the minimum is composed similarly. The thing of note is that we are now concerned with probabilities that the others are greater than \(X_{(1)}\). Thus, we switch out F(x) for 1-F(x): $$f_{X_{(1)}}(x)=n\left[1-F(x)\right]^{n-1}f(x)$$

Distribution of the Median

Through similar arguments and use of the multinomial/binomial formula, we can get the PDF of the median. For brevity, let n=2m+1 and that we do just the odd case. The pdf for \(X_{(m+1)}\) is given by: $$f_{X_{(m+1)}}(x)=\frac{(2m+1)!}{m!m!}\left[F(x)\right]^m\left[1-F(x)\right]^mf(x)$$ Does this result make sense? Well, we can group \([F(x)(1-F(x))]\). If we are around the median, \(F(x)\approx 0.5\), which yields \(F(x)(1-F(X))\approx0.25\) which is the maximum of the form \(f(y)=y(1-y)\) on [0,1]; away from the median, this goes below 0.25, and at the extremes, either F(x) or 1-F(x) goes to 0. Consequently, this maximmizes \([F(x)(1-F(X))]^m\) for \(m\geq 1\), which decays to 0 at least as fast as you go away from the median. Thus, the formula seems reasonable.

Extreme Value Distributions

Generalized Extreme Value Distribution

The GEV distribution has 3 parameters: \(\mu, \beta, \xi\) and has a CDF given by: $$F(x;\mu, \beta, \xi)=\exp(-(1+\xi\frac{x-\mu}{\beta})^{-1/\xi});\quad1+\xi\frac{x-\mu}{\beta}>0$$ where \(\mu\) is the location parameter, \(\beta>0\) is the scale parameter, and \(\xi\) is the shape parameter. The GEV distribution family is broken down into 3 categories based on whether \(\xi\) is positive, negative, or zero.

Three Categories

Gumbel Distribution (\(\xi = 0\))

The Gumbel distribution is used to describe the extreme value distribution for the exponential family of distributions (e.g.; Normal, Exponential). $$F(x)=\exp(-e^{-(x-\mu)/\beta}),\quad x\in\mathbb{R}$$ where \(\mu\) is the location parameter and \(\beta>0\) is the scale parameter.

Frechet Distribution (\(\xi \gt 0\))

The Frechet distribution models the maximum of heavy-tail distributions. $$F(x)=\begin{cases} 0, & x \leq \mu \\ \exp\left(-(\frac{x-\mu}{\beta})^{-\alpha}\right), & x\gt\mu \end{cases}$$ where \(\alpha>0\) is the shape parameter, \(\mu\) is the location parameter, and \(\beta\gt 0\) is the scale parameter.

Weibull Distribution (\(\xi \lt 0\))

$$F(x)=\begin{cases} \exp\left(-(-\frac{x-\mu}{\beta})^{\alpha}\right), & x\lt\mu\\ 1, & x \geq \mu \end{cases}$$ where \(\alpha>0\) is the shape parameter, \(\mu\) is the location parameter, and \(\beta\gt 0\) is the scale parameter.

Extreme Value Theorem

Also known as the Fisher-Tippett-Gnedenko Theorem to not be confused with the EVT from introductory Calculus. The EVT describes the convergence of the distribution of normalized maxima to the GEV distribution.

Extreme Value Theory Practice Problems

  1. Give the generic CDF for the distribution of the minimum of a sample of n from a distribution with density f(x)
  2. Similar to our formulas for the PDF of the order statistics, derive the distribution of the n-1'th order statistic for a generic distribution.