In previous chapters, we looked at the 1D Bell Curve. But the real world is multi-dimensional. When you're modeling a robot's position , or the price of 500 stocks in the S&P 500, you need the Multivariate Normal (MVN) distribution.
The MVN is the bedrock of modern Machine Learning, from Gaussian Processes to Variational Autoencoders (VAEs).
In 1D, the variance tells you how wide the bell is. In dimensions, we have a Mean Vector () and a Covariance Matrix (). This matrix doesn't just tell you how wide the distribution is; it tells you its shape (is it a circle? a stretched ellipse?) and its orientation (is it tilted?).
The Covariance Matrix (Σ)
| Layer | Operation | Shape | Note |
|---|---|---|---|
| Variables | Input dim | [d] | Number of random variables in the vector |
| Covariance | Σ = E[(X-μ)(X-μ)ᵀ] | [d, d] | Symmetric, positive semi-definite matrix |
Visualizing the Covariance Matrix
The diagonal elements () are the variances of each individual variable. The off-diagonal elements () are the covariances between variable and variable .
Visualizing Correlation
High off-diagonal values (0.8) mean the variables move together. In a 2D contour plot, this would look like a thin, tilted ellipse rather than a circle.
| Row | Stock A | Stock B |
|---|---|---|
| Stock A | 1 | 0.8 |
| Stock B | 0.8 | 1 |
The MVN is beloved by mathematicians and engineers because it is "mathematically closed" under almost every important operation:
- Marginals are Normal: If you have a 100-dimensional Normal distribution and you ignore 98 of the variables, the remaining 2 variables follow a 2D Normal distribution.
- Conditionals are Normal: If you have a joint distribution of Temperature and Humidity, and you observe that it's exactly 30°C, the distribution of Humidity (the conditional distribution) is still a Normal distribution.
- Linear Combinations are Normal: If you add two Normal vectors together, or multiply them by a matrix, the result is guaranteed to be Normal. This makes it incredibly easy to "propagate" uncertainty through linear systems.
In finance, investors use the MVN to model the returns of multiple assets. The Covariance Matrix tells them which stocks move together (bad for diversification) and which move in opposite directions (good for hedging). Finding the "Minimum Variance Portfolio" is an exercise in linear algebra using these Gaussian properties.