The Interactive World: Joint Distributions

The real world rarely involves just one variable in isolation. To understand a patient's health, a doctor looks at both Blood Pressure (XX) and Cholesterol (YY). To predict a stock's behavior, an analyst looks at the Asset Price (XX) and its Volatility (YY).

When variables interact, we use a Joint Distribution to describe their combined behavior.

Intuition
The 3D Probability Surface

For two continuous variables, the Joint PDF (f(x,y)f(x,y)) is like a mountain range on a map. The height of the mountain at any point (x,y)(x,y) shows how likely that specific combination is. The volume under the mountain for a certain region is the probability of falling in that area: P(X,YA)=Af(x,y)dxdyP(X, Y \in A) = \iint_A f(x,y) dx dy

Independence in Higher Dimensions

Two random variables XX and YY are independent if and only if their joint distribution is simply the product of their individual (marginal) distributions:

f(x,y)=fX(x)fY(y)f(x,y) = f_X(x) \cdot f_Y(y)
Marginal Distributions: Zooming In

If you have a joint distribution of Height and Weight, but you only care about Height, you compute the Marginal Distribution. You "integrate out" (or sum up) all the information about Weight to see only the Height distribution.

fX(x)=f(x,y)dyandfY(y)=f(x,y)dxf_X(x) = \int_{-\infty}^{\infty} f(x,y) dy \quad \text{and} \quad f_Y(y) = \int_{-\infty}^{\infty} f(x,y) dx
Weight Class (Y) \ Height Class (X)ShortAvgTallP(Weight Class (Y))
Under0.150.050.010.21
Avg0.050.40.050.5
Over0.010.050.230.29
P(Height Class (X))0.210.50.291

Total probability = 1

EExample
Reading the Table

In the table above, the probability of being Tall and Overweight is P(X=Tall,Y=Over)=0.23P(X=\text{Tall}, Y=\text{Over}) = 0.23. To find the marginal probability of being Tall (regardless of weight), we sum the 'Tall' column: 0.01+0.05+0.23=0.290.01 + 0.05 + 0.23 = 0.29.

Covariance & Correlation: Moving Together

How do we measure if two variables move together? If XX goes up, does YY usually go up too?

Intuition
The Sign of Covariance
  • Positive Covariance: When XX is above its mean, YY tends to be above its mean too (e.g., Height and Weight).
  • Negative Covariance: When XX goes up, YY goes down (e.g., Time spent gaming and Exam scores).
  • Zero Covariance: No linear relationship exists between the two.
Cov(X,Y)=E[(XμX)(YμY)]=E[XY]E[X]E[Y]Cov(X,Y) = E[(X-\mu_X)(Y-\mu_Y)] = E[XY] - E[X]E[Y]

Because Covariance is hard to interpret (its units are 'Height ×\times Weight'), we use Pearson Correlation (ρ\rho), which scales everything to a perfect range between 1-1 and 11.

ρX,Y=Cov(X,Y)σXσY\rho_{X,Y} = \frac{Cov(X,Y)}{\sigma_X \sigma_Y}

Correlation Strength

  • ρ=1\rho = 1: Perfect positive linear relationship.
  • ρ=1\rho = -1: Perfect negative linear relationship.
  • ρ=0\rho = 0: No linear relationship (but watch out!).
1.If XX and YY are independent, their correlation is guaranteed to be 0. This is easy to prove: since E[XY]=E[X]E[Y]E[XY] = E[X]E[Y] for independent variables, Cov(X,Y)=E[X]E[Y]E[X]E[Y]=0Cov(X,Y) = E[X]E[Y] - E[X]E[Y] = 0.
2.The Trap: If correlation is 0, it does NOT necessarily mean the variables are independent! Correlation only measures linear relationships.
3.Counter-Example: Let XX be any symmetric distribution around 0, and Y=X2Y = X^2. YY is perfectly dependent on XX—if you know XX, you know YY exactly. Yet, their correlation will be 0 because the relationship is a parabola, not a line.