Compound Distributions
Introduction to Compound Distributions
Recommended Prerequesites
- Probability
- Probability 2
- Maximum Likelihood Estimation
- Mixture Distributions
Definition
In probability and statistics, a compound distribution arises when one random variable is dependent on another. This is often the case when there is an underlying variability in the parameters of a distribution.
For example, if we model the number of insurance claims using a Poisson distribution, but the rate of claims (the parameter\(\lambda\)) is itself a random variable, we have a compound Poisson distribution.
This is an extension of the idea of conditional distributions. Now, the conditional distributed is conditioned on a random variable.
In a previous chapter, we covered mixture distributions, where a random variable selects a distribution and then a value is selected from there. In a compound distribution, we have the same thing, but our random variable samples over many (uncountably so, even) kinds of the same distribution, each with different values for a parameter.
From another perspective, we can view our primary distribution as being sampled. That sampled value is then plugged into another distribution as a parameter, and then we draw from that second distribution. In some sense, compound distributions are the integration to mixture distribuons' summation.
An Aside: Common Conditional Probability Rules
Law of Total Probability
$$P(A)=\sum_{n}P(A|B_n)P(B_n)$$
$$P(A)=\int_{-\infty}^{\infty}P(A|X=x)dF_X(x)$$
$$=\int_{-\infty}^{\infty}P(A|X=x)f_{X}(x)dx$$
Law of iterated expectations
A useful formula is the law of iterated expectations which relates the unconditional expectation to the conditional expectations.
This is essentially an extention of the Law of Total Probability, which becomes apparent if you write out the expectations explicitly.
$$\mathbb{E}[X]=\mathbb{E}[\mathbb{E}[X|Y]]$$
Law of Total Variance
$$\text{Var}(X)=\mathbb{E}[\text{Var}(X|Y)]+\text{Var}(\mathbb{E}[X|Y])$$
Returning to Defining Compound Distributions
A compound distribution is a probability distribution of a random variable \(X\) where the distribution of \(X\) depends on another random variable \(Y\).
This can be expressed as
$$X|Y=z\sim f_{X|Y}(x|y)$$
where \(X\) has a conditional distribution \(f_X(x|y)\), and \(Y\) follows a marginal distribution \(g_Y(y)\).
The unconditional (or marginal) distribution of \(X\) is found by integrating over the distribution of Y:
$$f_{X}(x)=\int_{-\infty}^{\infty}f_{X|Y}(x|y)g_{Y}(y)dy$$
The distribution of \(X\) is "compounded" by the randomness of Y.
This should hopefully look similar to the Law of Total Probability, as it is the same, just restated in probability distribution notation for P(A).
Other Characteristics
MGF
$$M_X(t)=\mathbb{E}_Y[M_{X|Y}(t|Y)]=\int M_{X|Y}(t|y)F_{Y}(y)dy$$
Characteristic Function
$$\varphi_X(t)=\mathbb{E}_Y[\varphi_{X|Y}(t|Y)]=\int \varphi_{X|Y}(t|y)F_{Y}(y)dy$$
Sampling
Sampling from a compound distribution involves two steps:
Step 1: Draw Parameter from Primary Distribution
First, sample the value of the parameter from the primary distribution.
Step 2: Sampling the Outcome
Given the sampled parameter, generate the final outcome from the secondary distribution using the sampled parameter as its argument.
User Guide
This tool allows you to generate compound distributions, where a parameter is drawn from one distribution (the primary distribution) and then used as a parameter for a second distribution (the secondary distribution). Follow these steps to use the generator:
Step 1: Select a Primary Distribution
The primary distribution generates the parameter for the secondary distribution. You can select from the following:
- Gamma: Produces positive values. Used for rates or scales. Support: (0, ∞)
- Normal: Produces both positive and negative values. Support: (-∞, ∞)
- Uniform: Produces values within a user-specified range. Support: [min, max]
- Beta: Produces values between 0 and 1. Support: (0, 1)
- Chi-Squared: Produces positive values. Support: (0, ∞)
For each primary distribution, you need to enter specific parameters:
- Gamma: Enter the shape (k) and rate (θ).
- Normal: Enter the mean (μ) and variance (σ²).
- Uniform: Enter the minimum and maximum values.
- Beta: Enter alpha (α) and beta (β).
- Chi-Squared: Enter degrees of freedom (k).
After selecting the primary distribution, the
support of that distribution will be shown, which indicates the range of possible output values.
Step 2: Select a Secondary Distribution
The secondary distribution uses the parameter generated by the primary distribution. Available options include:
- Poisson: Uses the primary output as the rate (λ). Support: λ must be > 0.
- Exponential: Uses the primary output as the rate (λ). Support: λ must be > 0.
- Normal: Uses the primary output as the variance. Support: Variance must be > 0.
- Binomial: Uses the primary output as the probability (p), with user input for the number of trials (n). Support: p must be between 0 and 1.
- Geometric: Uses the primary output as the probability (p). Support: p must be between 0 and 1.
When a secondary distribution is selected, any specific requirements for that distribution will be displayed.
It is very important that the output (support) of the first distribution matches the input (valid parameter range) of the second distribution.
Step 3: Generate the Compound Distribution
After choosing both distributions and entering the parameters, click the "Generate Compound Distribution" button to generate and plot the compound distribution. The tool will automatically validate the inputs and display alerts if any parameter falls outside of its valid range.
Example Use Case
Suppose you choose a Gamma distribution as the primary with a shape (k) of 2 and a rate (θ) of 2. The tool will generate positive values for λ. If you then choose a Poisson distribution as the secondary, the tool will use the λ values generated from the Gamma distribution to produce Poisson samples. You can then visualize the resulting compound distribution.
Step 5: View Summary Statistics
After the distribution is generated, you can view summary statistics such as:
- Mean: The average of the generated values.
- Variance: A measure of the variability in the data.
- Standard Deviation: The square root of the variance.
- Min and Max: The minimum and maximum values in the generated sample.
- Skewness: A measure of the asymmetry of the distribution.
- Kurtosis: A measure of the "tailedness" of the distribution (excess kurtosis, where normal distribution kurtosis = 0).
Step 6: Export the Data
You can export the generated data as a CSV file using the Export Data button. This allows you to save the sample for further analysis or reporting.
Compound Distribution Practice Problems
- Let \( \theta \sim \text{Beta}(\alpha, \beta) \), and given \( \theta \), \( X \sim \text{Binomial}(n, \theta) \) (i.e., the success probability \( \theta \) is drawn from a beta distribution).
- Find the marginal distribution of \( X \).
- Compute the mean and variance of \( X \)
- Suppose \( \theta \sim \text{Gamma}(\alpha, \beta) \), and given \( \theta \), \( X \sim \text{Exp}(\theta) \) (i.e., the rate of the exponential distribution is drawn from a gamma distribution).
- Find the marginal distribution of \( X \).
- Compute the mean and variance of \( X \)
- Let \( \mu \sim U(0,1) \) be drawn from a uniform distribution, and given \( \mu \), \( X \sim N(\mu, \sigma^2) \) (i.e., the mean of the normal distribution is selected from a uniform distribution).
- Find the marginal distribution of \( X \).
- Compute the mean and variance of \( X \)