Boundaries on Uncertainty

Sometimes, we know very little about our data—we might only know its average (E[X]E[X]) or its variance (Var(X)Var(X)). In these "uncertain" cases, we use inequalities to set a strict "worst-case scenario" on probability.

TTheorem
Markov's Inequality: A Floor on Probability

For any non-negative random variable XX: P(Xa)E[X]aP(X \ge a) \le \frac{E[X]}{a} for any a>0a > 0.

Intuition: If the average salary in a company is $50,000, no more than 1/4 of the employees can make $200,000 or more. If they did, it would drag the average higher than $50k!

Chebyshev's Inequality: The Power of Variance

Chebyshev's inequality is more powerful because it uses both the mean (μ\mu) and the standard deviation (σ\sigma).

P(Xμkσ)1k2P(|X - \mu| \ge k\sigma) \le \frac{1}{k^2}

Intuition: This tells us that for any distribution, at least 75% of the data MUST fall within 2 standard deviations (k=2k=2) of the mean, and at least 89% must fall within 3 standard deviations (k=3k=3). Unlike the Normal distribution's 95/99% rules, this works even for weird, non-Normal data.

The Law of Large Numbers (LLN)

The LLN is the "mathematical engine" of the world. It states that as the sample size nn goes to infinity, the Sample Mean (Xˉn\bar{X}_n) will converge exactly to the True Expected Value (μ\mu).

EExample
Why the Casino Always Wins

If you play one hand of blackjack, anything could happen—you could win or lose. But the house plays millions of hands. Because of the LLN, the house's average profit across those millions of hands is guaranteed to be the exact mathematical expectation (usually a 1-2% edge). This is why a casino doesn't need "luck"—it just needs the Law of Large Numbers and a lot of players.

Convergence of Sample Mean

Watch how the running average of rolling a 6-sided die eventually settles at the theoretical mean of 3.5. Early randomness is 'washed out' by the sheer volume of trials.

2.712.983.253.523.79-89.80948.11986302440625100Running Sample Mean: (10, 2.8)Running Sample Mean: (50, 3.2)Running Sample Mean: (100, 3.7)Running Sample Mean: (500, 3.45)Running Sample Mean: (1000, 3.52)Running Sample Mean: (5000, 3.501)xy
Running Sample Mean
The Magic: The Central Limit Theorem (CLT)

The CLT is perhaps the most profound theorem in all of mathematics. It states that if you take the sum (or average) of nn independent and identically distributed (i.i.d.) random variables, the result will always look like a Normal Distribution as nn grows, regardless of the original distribution's shape!

Intuition
The Miracle of Universality

It doesn't matter what distribution you start with—dice rolls, coin flips, or even weird, skewed distributions. When you add many of them together, the randomness "evens out" into the familiar Bell Curve. This is why the Normal distribution is the "default" shape of nature.

The Mathematical Statement

If X1,,XnX_1, \ldots, X_n are i.i.d. with mean μ\mu and variance σ2\sigma^2, then:

Xˉnμσ/ndN(0,1)as n\frac{\bar{X}_n - \mu}{\sigma/\sqrt{n}} \xrightarrow{d} \mathcal{N}(0, 1) \quad \text{as } n \to \infty

CLT Simulation Pipeline

How any 'raw' distribution eventually becomes Gaussian through the process of averaging many samples.

Any DistributionUniform / Skewed / DiscreteDraw n Samplese.g., n = 30Compute Sample Meanmean = (1/n) sum XiRepeat 10,000xBuild a distributionBell Curve!Normal Distribution
EExample
The Galton Board

Have you ever seen a board where balls drop through a grid of pins and collect in bins at the bottom? Each ball makes a "random" choice (left or right) at each pin. The final bin it lands in is the sum of all those random choices. This is why the balls always form a perfect Bell Curve at the bottom—it's a physical demonstration of the CLT!