Probability Part II
More on Probability
Recommended Prerequesites
- Probability
Other Common Distribution Attributes
Quantile Function
Recall the CDF is
$$F_X(x)=P(X\leq x)$$
This tells us what fraction of the time, we will see a value less than x.
But what if we want the opposite case: for a given percentage of the time, what would we expect the result to be under?
$$Q(p)=\inf \{x\in \mathbb{R}:p\leq F(x)\}$$
Moment (Raw)
The \(n^{th}\) (raw) moment, \(\mu_n\prime\) is given by
$$\mu_n'=\mathbb{E}[X^n]$$
We can extend this definition by talking about the moment about a value c as:
$$\mu'(n,c)=\mathbb{E}[(X-c)^n]$$
For the special case where c is the mean of X, we give this a special name, the
central moment.
Central Moment
We will denote the \(n^{th}\) central moment as \(\mu_n\)
$$\mu_n=\mathbb{E}[(X-E[X])^n]=\mathbb{E}[(X-\mu_1)^n]$$
For n=2, you may recognize this formula as the variance:
$$\mu_2=\mathbb{E}[(X-E[X])^2]=\sigma^2$$
Standardized Moment
Further building on the concept of the central moment is the standardized moment, which is a scaled version of the central moment.
The \(n^{th}\) standardized moment \(\tilde{\mu}_n\) is given by:
$$\tilde{\mu}_n=\mathbb{E}[\left(\frac{X-\mu}{\sigma}\right)^n]$$
where \(\mu\) is the first central moment (mean) and \(\sigma\) is the standard deviation (square root of the variance).
The standardized moments are where the common definitions of skewness and kurtosis come from. They are defined as \(\tilde{\mu}_3\) and \(\tilde{\mu}_4\) respectively.
Moment-Generating Function (MGF)
Before, we talked about getting the
expected value of a function of a random variable.
We will now define the MGF as a specific case of such:
$$M_X(t)\equiv \mathbb{E}[e^{tX}]$$
For a discrete random variable, this is calculated as:
$$M_X(t)=\sum_{i=0}^{\infty}e^{tx_i}p(x_i)$$
For a continuous random variable, this is calculated as:
$$M_X(t)=\int_{-\infty}^{\infty}e^{tx}f_X(x)dx$$
Note that this is not a function of x, but rather of a new variable t, as we are integrated over the support of X.
While this may seem like a strange choice, it ends up having useful properties.
Let us define the \(n^{th}\) (non-central) moment \(m_n\) as
$$m_n\equiv \mathbb{E}[X^n]$$
We can obtain \(m_n\) by differentiating the MGF n times and then evaluating it at t=0:
$$m_n=\frac{d^n M_X}{dt^n}\rvert_{t=0}$$
Hence the name Moment Generating Function. Note: this may not exist for all distributions. However, there is a closely related version which is, at the price of imaginary numbers and slightly more complicated calculations.
Laplace Transform
The Laplace distribution has a similar form to the MGF:
$$\mathcal{L}\{f\}(s)=\mathbb{E}[e^{-sX}]$$
Characteristic Function
The characteristic function is defined similarly to the MGF:
$$\varphi_X(t)\equiv \mathbb{E}[e^{itX}]$$
Note however, the inclusion of i.
This can also be used to generate moments of a distribution (provided they exist):
$$\frac{d^n \varphi_X}{dt^n}\rvert_{t=0}=i^km_n$$
One example of a distribution that possesses a CF but not an MGF is the Cauchy distribution:
$$f(x;x_0,\gamma)=\frac{1}{\pi \gamma \left[1+\left(\frac{x-x_0}{\gamma}\right)^2\right]}$$
You cannot use the CF to generate the moments for the Cauchy distribution, as they are undefined, but the Characteristic Function does exist for it. There are uses of the Characteristic Function beyond generating moments, which is why this is important that it exists, even for difficult distributions.
Cumulants and the Cumulant Generating Function
An alternative the moments are the cumulants. These are derived similarly to that of moments from the MGF. In fact, the cumulant-generating function is just the log of the MGF.
$$K(t)=\log (\mathbb{E}[e^{tx}])$$
To get the cumulants, we take differentiate K(t) and evaluate it at 0, just like with the MGF.
$$\kappa_n=K^{(n)}(0)$$
Entropy
Continuing with the trend of looking at expectations of distribution, the entropy, H, is given by
$$H(X)=\mathbb{E}[-\log p(x)]$$
where p(x) is the PMF of X. What if we have a continuous variable? It turns out that some properties change. We will cover the case briefly, called Differential Entropy, later in the book.