Depdence
Introduction to Dependence Metrics
Recommended Prerequesites
- Probability
- Probability II
- Introduction to Multivariate Distributions
- Distribution Distance
General Types
Autocorrelation
Autocorrelation measures the correlation of a time series with a lagged version of itself.
Given a time series \(\left\{X_t\right\}\) where \(t=1,2,\dots,n\), the autocorrelation at lag k is defined as:
$$\rho(k)=\frac{\text{Cov}(X_t,X_{t-k})}{\text{Var}(X_t)}$$
Autocorrelation is used in time series analysis to detect trends, seasonality, and repeating patterns. It plays a key role in models like AR (Auto-Regressive) models, where current values are modeled as linear functions of past values.
Pairwise Independence
Pairwise Independence is a weaker condition of independence between random variables. Two variables are pairwise independent if knowing the outcome of one provides no information about the outcome of the other, but pairwise independence does not necessarily imply full independence across all variables.
Random variables \(X_1,X_2,\dots,X_n\) are pairwise independent if for all \(i\neq j\):
$$P(X_i\cap X_j)=P(X_i)P(X_j)$$
Conditional Dependence
Conditional Independence refers to a situation where two random variables are independent, given the value of a third variable. Conditional independence is a key concept in Bayesian networks, graphical models, and statistical modeling, where it allows for simplified representations of complex dependencies.
Let \(X,Y,Z\) be random variables. X and Y are said to be conditionally independent given Z if:
$$P(X|Y,Z)=P(X|Z)$$
Metrics
Spearman Correlation
Spearman correlation assesses how well the relationship between two variables can be described using a monotonic function.
$$\rho=1-\frac{6\sum_{i=1}^n(R_X(i)-R_Y(i))^2}{n(n^2-1)}$$
Let \((X_1,Y_1),(X_2,Y_2),\dots,(X_n,Y_n)\) be a set of paired observations. This could be thought of as having been at the same time point, for example.
Compare the pairs at two indices. Label the pair as:
- Concordant if \((X_i-X_j)(Y_i-Y_j)\gt0\)
- Discordant if \((X_i-X_j)(Y_i-Y_j)\lt0\)
In plain English, this means that if X is greater at time i than at time j, then Y is concordant if Y is greater at time i than it was at time j.
If the two variables have the opposite relationship, it is discordant.
Now, let us define the counts for each type of pair; i.e.:
$$C=\sum_{i}^{n-1}\sum_{j=i+1}^n 1_{(X_i-X_j)(Y_i-Y_j)\gt0}$$
$$D=\sum_{i}^{n-1}\sum_{j=i+1}^n 1_{(X_i-X_j)(Y_i-Y_j)\lt0}$$
Then we define Kendall's Tau as
$$\tau=\frac{C-D}{n \choose 2}$$
Mutual Information
Unlike Kendall's Tau and Spearman's Rho, which assess specific types of dependence (such as monotonic or rank-based), MI captures all forms of dependence, including non-linear and complex relationships.
$$I(X,Y)=\int\int p_{XY}(x,y)\log\left(\frac{p_{XY}(x,y)}{p_X(x)p_Y(y)}\right)dxdy$$
where \(p_{XY}(x,y)\) is the joint probability density function of X and Y; \(p_X(x)\) and \(p_Y(y)\) are the marginal densities of X and Y, respectively.
Tail Dependence
Tail dependence refers to the behavior of extreme values in multivariate distributions. It captures the likelihood that extreme outcomes in one variable will be accompanied by extreme outcomes in another.
For two random variables \(X\) and \(Y\), the upper tail dependence coefficient \(\lambda_U\) is defined as the limit of the conditional probability that Y exceeds a high threshold, given that X also exceeds that threshold:
$$\lambda_U=\lim_{u\rightarrow 1^-}P(Y\gt F_Y^{-1}(u)|X>F_X^{-1}(u))$$
where \(F_X\) and \(F_Y\) are the marginal cumulative functions of X and Y respectively, and \(F_X^{-1}(u)\) and \(F_Y^{-1}(u)\) are the quantiles corresponding to the probability level u.
Similarly, the lower tail dependence coefficient \(\lambda_L\) is defined as:
$$\lambda_L=\lim_{u\rightarrow 0^+}P(Y\leq F_Y^{-1}(u)|X\leq F_{X}^{-1}(u))$$
$$\lambda_U,\lambda_L\in [0,1]$$
A value of 0 indicates no tail dependence. A value of 1 implies perfect tail dependence.