Distribution Textbook (Work in Progress)

by John Della Rosa

Mixture Distributions

Introduction to Mixture Distributions

Recommended Prerequesites

  1. Probability
  2. Probability 2
  3. Maximum Likelihood Estimation

Explanation

In probability and statistics, a mixture distribution arises when the population being studied is composed of several distinct subpopulations, each of which follows a different probability distribution. Instead of modeling the population as a whole with a single distribution, we assume that the data is generated from one of several component distributions, each with its own characteristics. Formally, a mixture distribution represents a probability distribution made up of a weighted combination of other distributions. This should not be confused with the concept of a sum of the realizations of random variables, which is different, but easily mistaken for it; that would be given by convolution.

Definition

A mixture distribution is defined as a weighted sum of two or more component distributions. Suppose there are k component distributions \(f_1(x),f_2(x),\dots,f_k(x)\), with weights \(w_1,w_2,\dots,w_k\), where \(w_i\geq 0\) and \(\sum_{i=1}^{k}w_if_i(x)=1\). Then the PDF or PMF of the mixture distribution is given by: $$f(x)=\sum_{i=1}^{k}w_if_i(x)$$ One possible interpretation is that \(w_i\) is the probability of sampling from \(f_i(x)\).

Mixture Distribution Generator

Graph Customization

Sampling from a Mixture Distribution

Sampling from a mixture distribution is relatively simple, provided the component distributions are easy to sample from.

Steps

Step 1: Select a Component Based on Weights
Select a Component Based on Weights: The first step is to randomly select one of the component distributions according to their weights. This step uses the weights as probabilities, so if a component has a larger weight, it is more likely to be selected. You can think of this step as performing a weighted coin flip or drawing from a categorical distribution where each category corresponds to one of the component distributions.

Example:
Step 2: Sample from the Selected Component
After selecting a component distribution, the next step is to generate a sample from the chosen distribution.
Example:

Pseudocode

Here is pseudocode for sampling from a mixture distribution:

                  Input:
                  - List of component distributions: D1, D2, ..., Dk
                    (Each D represents a different distribution, like normal, exponential, or uniform. These are the components of the mixture.)
                  
                  - List of corresponding weights: w1, w2, ..., wk
                    (The weights determine how likely each component is to be selected. They must sum to 1 so that they form a valid probability distribution.)
                
                Output:
                  - Sampled value from the mixture distribution
                    (This is the final value sampled from one of the component distributions, determined by the weights.)
                
                1. Generate a random number u from U(0, 1)
                   (Generate a uniform random number between 0 and 1. This number will be used to select a component distribution based on the weights. The value of `u` will help simulate the probability of selecting each component.)
                
                2. Use the weights to determine which component distribution to sample from:
                   - Set cumulative_weight = 0
                     (This is an accumulator that will store the running sum of weights. It helps track when the random number `u` falls into a range corresponding to a specific component.)
                   
                   - For each component i from 1 to k:
                     (Iterate through each of the k component distributions.)
                     
                     a. cumulative_weight += wi
                        (Add the weight of the current component `wi` to the cumulative total. This step accumulates the probability mass for each component.)
                
                     b. If u <= cumulative_weight:
                        (Check if the random number `u` falls within the cumulative probability range for the current component.)
                        
                        - Select component Di
                          (If the condition is met, the current component `Di` is selected for sampling, since the random number `u` fell within its weight range.)
                
                        - Break the loop
                          (Exit the loop once a component is selected, as there's no need to check the remaining components.)
                
                3. Sample a value from the selected component distribution
                   (After a component is selected, sample a value from that specific distribution. Each distribution has its own sampling method, such as using the Box-Muller transform for a normal distribution, or inverse transform sampling for an exponential distribution.)
                
                4. Return the sampled value
                   (The function ends by returning the value sampled from the selected component distribution, which represents the final sample from the mixture distribution.)
                
                

Mixture Distribution Practice Problems

  1. Write code in your preferred language which samples from a mixture distribution that is N(0,1) with weight 0.3 and N(4,2) with weight 0.7.
  2. Sampling from a Normal-Exponential Mixture Distribution

    Consider a mixture distribution with two components:

    • A normal distribution \( N(0, 1) \) with weight \( 0.6 \)
    • An exponential distribution with rate parameter \( \lambda = 2 \) and weight \( 0.4 \)

    1. Write the probability density function (PDF) of the mixture distribution.
    2. If you were to generate a sample from this mixture distribution, what is the probability that the sample is drawn from the normal distribution?
    3. Write a pseudocode algorithm to generate a sample from this mixture distribution.
  3. Mixture of Uniform Distributions

    You are given a mixture of two uniform distributions:

    • \( U(0, 1) \) with weight \( 0.3 \)
    • \( U(2, 4) \) with weight \( 0.7 \)
    1. Compute the probability that a randomly sampled value from the mixture is less than 1.
    2. Find the cumulative distribution function (CDF) of the mixture distribution.
    3. Write a Python function that samples from this mixture distribution.
  4. Mixture of Three Normal Distributions

    Consider a mixture of three normal distributions with the following parameters:

    • \( N(-1, 0.5^2) \) with weight \( 0.2 \)
    • \( N(2, 1^2) \) with weight \( 0.5 \)
    • \( N(4, 0.25^2) \) with weight \( 0.3 \)
    1. Derive the expected value \( E[X] \) of the mixture distribution.
    2. Derive the variance \( \text{Var}(X) \) of the mixture distribution.
    3. Describe the steps to sample from this mixture distribution.