Extreme Value Theory

Introduction to Extreme Value Theory

Recommended Prerequesites

What is Extreme Value Theory?

Extreme value theory (EVT) deals with understanding the behavior of tail events. One example is understanding the maximum from repeated sampling from a distribution.

Introduction to Order Statistics

Previously, we've been dealing with statistics such as the mean. Order statistics provide a way to analyze the behavior of the extremes (minimum and maximum) as well as the general properties of distributions and their sample representations.

Example

Let $x_1, x_2,\dots, x_n$ be a sample of n i.i.d random variables drawn from a distribution X. The order statistics of this sample are defined as the sorted values of the sample from smallest to largest: $$x_{(1)}\leq x_{(2)}\leq \dots \leq x_{(n)}$$ Here:

$x_{(1)}$ is the minimum of the sample
$x_{(n)}$ is the maximum of the sample
$x_{(k)}$ is the k-th order statistic; i.e., the $k^{th}$ smallest value in the sample

Median

The median is another common descriptor for a sample or distribution. We can think of it in the context of order statistics given what we learned above.

Odd

If n is odd, the median is the $(\frac{n+1}{2})$-th order statistic

Even

If n is even, the median is defined as the arithmetic mean of the two middle-most order statistics, since there is not a single middle: $$\text{Median}=\frac{x_{(\frac{n}{2})}+x_{(\frac{n}{2}+1)}}{2}$$

Distribution of Order Statistics

One nice thing about deriving the distribution of various order statistics is that they can be described well in plain English.

Distribution of the Maximum

Let us have a fixed number of draws in our sample, n, from a distribution $F_X$. We will denote the elements in our sample as $X_{(k)}$ for the k-th order statistic. The probability that all n samples do not exceed a certain number is simply given by $$P(M_n\leq x)=P(X_1\leq x, X_2\leq x, \dots, X_n\leq x)=[F(x)]^n$$ But what is the shape of this distribution? It can be gotten through variable substitution and simple use of the power/chain rule from calculus, but let's intuitively derive it.

The probability that $X_{(n)}=x$ is not as simple as the probability of getting x when sampling for a given element like when we use our PMF or PDF. Instead we have to consider the relationship as well with what it implies for the other elements of the sample. We must consider the probability of getting $X_i=x$ for some $i\in\{1,2,\dots,n\}$ and for it to be the maximum, $X_{j}\leq x \forall j\neq i$.

Since there are n possible draws to get $X_i=x$, it makes sense to have n in there.
The probability that for another draw in the series to not exceed that is $P(X_j\leq x)=F(x)$. We must have this occur n-1 times, so we raise this to the n-1 power. Finally, the probability of a given draw in the sample equalling x is $P(X_i=x)=f(x)$.
Thus, we finally get the PDF of $X_{(n)}$ as: $$f_{X_{(n)}}(x)=n\left[F(x)\right]^{n-1}f(x)$$

Interpretation

What does this imply? Well, if x corresponds to a low quantile, then F(x) gets raised to the n-1 power, which greatly shrinks the probability. This intuitively means that as the sample size is larger, the probability that the max of that sample is a small number is very small.

Simulation

Sample Size (n): 10

Number of Simulations: 1000

Number of Bins: 30

X-Axis Min:

X-Axis Max:

Distribution:

Auto Update on Parameter Change

Summary Statistics

Mean:

Median:

Variance:

Standard Deviation:

Distribution of the Minimum

By symmetry, the distribution of the minimum is composed similarly. The thing of note is that we are now concerned with probabilities that the others are greater than $X_{(1)}$. Thus, we switch out F(x) for 1-F(x): $$f_{X_{(1)}}(x)=n\left[1-F(x)\right]^{n-1}f(x)$$

Distribution of the Median

Through similar arguments and use of the multinomial/binomial formula, we can get the PDF of the median. For brevity, let n=2m+1 and that we do just the odd case. The pdf for $X_{(m+1)}$ is given by: $$f_{X_{(m+1)}}(x)=\frac{(2m+1)!}{m!m!}\left[F(x)\right]^m\left[1-F(x)\right]^mf(x)$$ Does this result make sense? Well, we can group $[F(x)(1-F(x))]$. If we are around the median, $F(x)\approx 0.5$, which yields $F(x)(1-F(X))\approx0.25$ which is the maximum of the form $f(y)=y(1-y)$ on [0,1]; away from the median, this goes below 0.25, and at the extremes, either F(x) or 1-F(x) goes to 0. Consequently, this maximmizes $[F(x)(1-F(X))]^m$ for $m\geq 1$, which decays to 0 at least as fast as you go away from the median. Thus, the formula seems reasonable.

Extreme Value Distributions

Generalized Extreme Value Distribution

The GEV distribution has 3 parameters: $\mu, \beta, \xi$ and has a CDF given by: $$F(x;\mu, \beta, \xi)=\exp(-(1+\xi\frac{x-\mu}{\beta})^{-1/\xi});\quad1+\xi\frac{x-\mu}{\beta}>0$$ where $\mu$ is the location parameter, $\beta>0$ is the scale parameter, and $\xi$ is the shape parameter. The GEV distribution family is broken down into 3 categories based on whether $\xi$ is positive, negative, or zero.

Three Categories

Gumbel Distribution ($\xi = 0$)

The Gumbel distribution is used to describe the extreme value distribution for the exponential family of distributions (e.g.; Normal, Exponential). $$F(x)=\exp(-e^{-(x-\mu)/\beta}),\quad x\in\mathbb{R}$$ where $\mu$ is the location parameter and $\beta>0$ is the scale parameter.

Frechet Distribution ($\xi \gt 0$)

The Frechet distribution models the maximum of heavy-tail distributions. $$F(x)=\begin{cases} 0, & x \leq \mu \\ \exp\left(-(\frac{x-\mu}{\beta})^{-\alpha}\right), & x\gt\mu \end{cases}$$ where $\alpha>0$ is the shape parameter, $\mu$ is the location parameter, and $\beta\gt 0$ is the scale parameter.

Weibull Distribution ($\xi \lt 0$)

The Weibull distribution is used when there is a finite upper bound. $$F(x)=\begin{cases} \exp\left(-(-\frac{x-\mu}{\beta})^{\alpha}\right), & x\lt\mu\\ 1, & x \geq \mu \end{cases}$$ where $\alpha>0$ is the shape parameter, $\mu$ is the location parameter, and $\beta\gt 0$ is the scale parameter.

Extreme Value Theorem

Also known as the Fisher-Tippett-Gnedenko Theorem to not be confused with the EVT from introductory Calculus. The EVT describes the asymptotic distribution of the maximum (or minimum) of a large sample of iid variables. Let $\left\{X_1,X_2,\dots,X_n\right\}$ be a sequence of iid random variables with CDF $F_X(x)$. What is the behavior of the maximum of the sequence? $$M_n=\max(X_1,X_2,\dots,X_n)$$ How does $M_n$ behave as n grows large? Does there exist a limiting distribution for the appropriately normalized maximum $M_n$. Do there exist constants $a_n\gt 0$ and $b_n\in\mathbb{R}$ such that: $$\lim_{n\rightarrow \infty}P(\frac{M_n-b_n}{a_n}\leq z)=G(x)$$ where $G(z)$ is a non-degenerate distribution function.

Statement of the Theorem

The EVT states that, for a wide class of underlying distributions, the limiting distribution of the normalized maximum of a sequence of iid random variable converges to one of three possible GEV:

Gumbel for distributions with tails that decay faster than any power(e.g. exponential behavior)
Frechet for distributions with heavy tails
Weibull for distributions with finite upper bounds

These three were described above, but now their significance is given context.

Theory of Regular Variation

The theory of regular variation helps describe the asymptotic behavior of functions and distribution.

Definition

A function $L: (0,\infty)\to(0,\infty)$ is called slowly varying at infinity if for all $t\gt 0$: $$\lim_{x\to\infty}\frac{L(tx)}{L(x)}=1$$ A measurable function $f:(0,\infty)\to(0,\infty)$ is said to be regularly varying at infinity with index $\alpha\in\mathbb{R}$ if it can be written in the form: $$f(x)=x^{\alpha}L(x)$$ where $L(x)$ is a slowly varying function at infinity. We can write $f\in RV_{\alpha}$ for f regularly varying with index $\alpha$. If $\alpha=0$, then $f$ is slowly varying.

Domain of Attraction

A distribution function $F$ is said to belong to the domain of attraction to an extreme value distribution if there exists sequences $\{a_n\gt 0\}$ and $\{b_n\}$ such that: $$\lim_{n\to\infty}F^{n}(a_n x+ b_n)=G(x)$$ where $G$ is a non-degenerate extreme value distribution.

Heavy Tails

A distribution function $F$ is said to have a heavy tail if: $$\bar{F}(x)=1-F(x)\in RV_{-\alpha},\quad \alpha\gt 0$$ This means that the survival function $\bar{F}(x)$ is regularly varying with index $-\alpha$, indicating a polynomial decay of the tail.

Example: Pareto Distribution

For $x\geq x_0 \gt 0$: $$F(x)=1-(\frac{x_0}{x})^{\alpha},\quad \alpha\gt0$$ The survival function $\bar{F}(x)=(x_0/x)^{\alpha}\in RV_{-\alpha}$

Example: Student's t-Distribution

$$\bar{F}(x)\sim kx^{-\nu},\quad x\to\infty$$

Tail Equivalence

Two distribution functions F and G are tail-equivalent if: $$\lim_{x\to\infty}\frac{\bar{F}(x)}{\bar{G}(x)}=c\in(0,\infty)$$ If F and G are tail-equivalent and F is regularly varying, so is G.

Light Tails

A distribution is considered light-tailed if its tail probability decrease exponentially or faster as $x\to\infty$. Formally, a distribution F is light-tailed if its survival function satisfies: $$\limsup_{x\to\infty}\frac{\log(\bar{F}(x))}{x}\lt 0$$ The distribution to said to be exponentially decaying if the above is true.

Super-Exponential Decay

For distributions with tails that decay faster than any exponential function, such as the Gaussian distribution, we have: $$\lim_{x\to\infty}\frac{\log(\bar{F}(x))}{x^2}= -\infty$$

Rapidly Varying Functions

A mesurable function $f:(0,\infty)\to(0,\infty)$ is called rapidly varying at infinity if for all $\lambda\gt 0$: $$\lim_{x\to\infty}\frac{f(\lambda x)}{f(x)}=0$$ Constrast this with the earlier definitions regarding regular variation.

Cramer's Condition

A distribution satisfies Cramer's condition if there exists $\theta\gt 0$ such that: $$\mathbb{E}[e^{\theta X}]\lt \infty$$ Cramer's condition implies that the distribution is light-tailed

Example: Exponential

$$\bar{F}(x)=e^{-\lambda x},\quad x\geq 0$$