The Multivariate Normal Distribution

The N-Dimensional Bell Curve

In previous chapters, we looked at the 1D Bell Curve. But the real world is multi-dimensional. When you're modeling a robot's position $(x,y,z)$ , or the price of 500 stocks in the S&P 500, you need the Multivariate Normal (MVN) distribution.

The MVN is the bedrock of modern Machine Learning, from Gaussian Processes to Variational Autoencoders (VAEs).

✦Intuition

The Geometry of Probability

In 1D, the variance $\sigma^2$ tells you how wide the bell is. In $N$ dimensions, we have a Mean Vector ( $\boldsymbol{\mu}$ ) and a Covariance Matrix ( $\Sigma$ ). This matrix doesn't just tell you how wide the distribution is; it tells you its shape (is it a circle? a stretched ellipse?) and its orientation (is it tilted?).

The Covariance Matrix (Σ)

Layer	Operation	Shape	Note
Variables	Input dim	[d]	Number of random variables in the vector
Covariance	Σ = E[(X-μ)(X-μ)ᵀ]	[d, d]	Symmetric, positive semi-definite matrix

f(\mathbf{x}) = \frac{1}{\sqrt{(2\pi)^d |\Sigma|}} \exp\left( -\frac{1}{2} (\mathbf{x}-\boldsymbol{\mu})^T \Sigma^{-1} (\mathbf{x}-\boldsymbol{\mu}) \right)

Visualizing the Covariance Matrix

The diagonal elements ( $\Sigma_{ii}$ ) are the variances of each individual variable. The off-diagonal elements ( $\Sigma_{ij}$ ) are the covariances between variable $i$ and variable $j$ .

Visualizing Correlation

High off-diagonal values (0.8) mean the variables move together. In a 2D contour plot, this would look like a thin, tilted ellipse rather than a circle.

Row	Stock A	Stock B
Stock A	1	0.8
Stock B	0.8	1

Low

High

Why is MVN so Special?

The MVN is beloved by mathematicians and engineers because it is "mathematically closed" under almost every important operation:

Marginals are Normal: If you have a 100-dimensional Normal distribution and you ignore 98 of the variables, the remaining 2 variables follow a 2D Normal distribution.
Conditionals are Normal: If you have a joint distribution of Temperature and Humidity, and you observe that it's exactly 30°C, the distribution of Humidity (the conditional distribution) is still a Normal distribution.
Linear Combinations are Normal: If you add two Normal vectors together, or multiply them by a matrix, the result is guaranteed to be Normal. This makes it incredibly easy to "propagate" uncertainty through linear systems.

1.Let

\mathbf{X}

be a Gaussian random vector:

\mathbf{X} \sim \mathcal{N}(\boldsymbol{\mu}, \Sigma)

2.Consider a linear transformation:

\mathbf{Y} = \mathbf{A}\mathbf{X} + \mathbf{b}

3.The new mean vector is:

E[\mathbf{Y}] = \mathbf{A}\boldsymbol{\mu} + \mathbf{b}

4.The new covariance matrix is:

Cov(\mathbf{Y}) = \mathbf{A\Sigma A^T}

5.Result:

\mathbf{Y} \sim \mathcal{N}(\mathbf{A}\boldsymbol{\mu} + \mathbf{b}, \mathbf{A}\Sigma \mathbf{A}^T)

. This property is the mathematical foundation for the Kalman Filter, which NASA used to land the Apollo missions on the moon!

∎

EExample

Portfolio Optimization

In finance, investors use the MVN to model the returns of multiple assets. The Covariance Matrix tells them which stocks move together (bad for diversification) and which move in opposite directions (good for hedging). Finding the "Minimum Variance Portfolio" is an exercise in linear algebra using these Gaussian properties.

← Previous

Transformations & Generating Functions

Course Progression

17 of 25

Limit Theorems & Bounds