Random Variables & Expectations

The Bridge: Random Variables

In previous chapters, we talked about abstract events like "Heads" or "Rainy." But mathematics works best with numbers. A Random Variable (RV) is the bridge that maps these abstract outcomes to the real number line.

Contrary to its name, a Random Variable is neither random nor a variable—it is a deterministic function that assigns a numerical value to every possible outcome in our sample space.

From Outcomes to Numbers

A random variable is a function from outcomes to real numbers.

EExample

Why do we need this?

Imagine you flip two coins. The sample space is $\Omega = \{HH, HT, TH, TT\}$ . If we define $X$ as the number of heads, we map these outcomes to the set $\{2, 1, 1, 0\}$ . Now, instead of talking about "the event that we got one head and one tail," we can simply ask for $P(X=1)$ . This transformation allows us to use algebra, calculus, and statistics to analyze uncertainty.

The Two Domains of Randomness

RVs come in two primary flavors, and the math we use depends entirely on which one we're dealing with:

Discrete Random Variables: Outcomes you can count (e.g., number of students in a class, number of heads in 10 flips). These are described by a Probability Mass Function (PMF), $P(X=x)$ .
Continuous Random Variables: Outcomes that can take any value in a range (e.g., your exact height, the time until the next bus arrives). These are described by a Probability Density Function (PDF), $f(x)$ , where the probability of a specific point is zero, but the area under the curve represents probability.

Expectation: The Long-Run Average

The Expected Value ( $E[X]$ or $\mu$ ) is the "center of mass" of a distribution. If you played a game based on $X$ millions of times, $E[X]$ is what your average result would be.

E[X] = \sum_{x} x \cdot P(X=x) \quad \text{(Discrete Weighting)}

✦Intuition

The Center of Gravity

Think of the probability distribution as a physical object. The Expected Value is the point where you could balance that object on your finger. It's the "pivot point" of the distribution. If a distribution is perfectly symmetric, the expected value is always the center of symmetry.

Linear Properties

Expectation is a linear operator, which is one of the most useful properties in all of probability:

Scaling: $E[cX] = cE[X]$
Addition: $E[X + Y] = E[X] + E[Y]$ (Even if $X$ and $Y$ are dependent!)

TTheorem

LOTUS: Law of the Unconscious Statistician

If you want to find the expected value of a function of $X$ (like $X^2$ or $\log(X)$ ), you don't need to find the distribution of $g(X)$ first. You can just "pass through" the original probabilities: $E[g(X)] = \sum_{x} g(x) P(X=x)$

Variance: The Measure of Risk

If Expectation tells us where the center is, Variance ( $Var(X)$ or $\sigma^2$ ) tells us how much the outcomes "swing" around that center. It measures the average squared distance from the mean.

Var(X) = E[(X - \mu)^2]

✦Intuition

Spread vs. Center

Two investments might have the exact same expected return (say, 5%), but one is a "Safe" savings account (low variance) and the other is a "Volatile" crypto-asset (high variance). Variance quantifies that risk. Because we square the distances, variance is always non-negative.

1.The definition

Var(X) = E[(X - \mu)^2]

is often messy to calculate. We can derive a much cleaner formula.

2.Expand the square:

E[X^2 - 2\mu X + \mu^2]

3.Use linearity of expectation:

E[X^2] - E[2\mu X] + E[\mu^2]

4.Pull out constants:

E[X^2] - 2\mu E[X] + \mu^2

5.Since

E[X] = \mu

E[X^2] - 2\mu^2 + \mu^2 = \mathbf{E[X^2] - (E[X])^2}

. This is the 'Mean of the Squares minus the Square of the Mean'.

∎

Variance Visualized

High vs. Low Variance. Both distributions are centered at 0, but the 'Risky' one (green) spreads its probability across a much wider range.

Low Variance (Safe)High Variance (Risky)

← Previous

Conditional Probability & Independence

Course Progression

3 of 25

Discrete Distributions