Sampling Distributions

Distributions of Statistics

In the world of statistics, we don't just care about the distribution of individual people—we care about the distribution of samples. If you take a sample of 10 people and calculate their average height, that "average" ( $\bar{X}$ ) is itself a random variable. The distribution of such a statistic is called a Sampling Distribution.

The Chi-Squared (χ²) Distribution

Imagine taking a standard normal variable $Z \sim \mathcal{N}(0,1)$ and squaring it. Since $Z^2$ can never be negative, the distribution is "pushed" to the right. If you sum $k$ of these squared variables, you get a Chi-Squared distribution with $k$ degrees of freedom.

\text{If } Z_1, \ldots, Z_k \sim \mathcal{N}(0,1) \text{ independently, then } Q = \sum_{i=1}^k Z_i^2 \sim \chi^2_k

✦Intuition

The Distribution of Variance

Why do we care about Chi-Squared? Because Sample Variance ( $S^2$ ) follows a Chi-Squared distribution (specifically, $(n-1)S^2/\sigma^2 \sim \chi^2_{n-1}$ ). If you want to test whether a new manufacturing process has reduced the "spread" of defects, you are performing a Chi-Squared test.

Chi-Squared Distributions

As the degrees of freedom (k) increase, the distribution moves to the right and becomes more 'Normal' (thanks to the Central Limit Theorem!).

k=2 (Highly Skewed)k=5

Student's t-Distribution: The Guinness Secret

The t-distribution was invented by William Gosset, a chemist at the Guinness Brewery in 1908. He needed a way to monitor the quality of stout using only very small samples. Since Guinness forbade its employees from publishing secrets, he wrote under the pseudonym "Student."

✦Intuition

Accounting for Uncertainty

When we don't know the true population variance (which is almost always the case), we have to estimate it from our data. This extra layer of estimation adds "noise." The t-distribution looks like a Normal curve but with Fat Tails—it effectively says, "With a small sample, extreme outcomes are more likely than the Normal distribution would predict."

Standardization with Estimated Variance

If $X_1, \ldots, X_n \sim \mathcal{N}(\mu, \sigma^2)$ , but $\sigma^2$ is unknown, then:

T = \frac{\bar{X} - \mu}{S / \sqrt{n}} \sim t_{n-1}

Standard Normal vs. t-Distributions

Notice how the t-distribution (green) has much more area in the tails than the Normal (blue). This is the 'uncertainty penalty' for having a small sample size.

Normal (Perfect Knowledge)t(3) - Small Sample

Order Statistics: Min, Max, and Median

What is the probability that the fastest person in a race breaks a record? Or that the median house price in a city stays below $500k? These are questions about Order Statistics.

If we sort a sample from smallest to largest: $X_{(1)} \le X_{(2)} \le \cdots \le X_{(n)}$ , then $X_{(k)}$ is the $k$ -th order statistic.

EExample

Extreme Events in Hydrology

Order statistics are used to predict the "100-year flood." This is the $X_{(n)}$ (the sample maximum) of annual rainfall over a century. By studying the distribution of the maximum, engineers can build bridges strong enough to survive rare, extreme storms.

Understanding the Median

Input

A random sample of size n

Output

The middle value

Complexity

1
Sort
Arrange the data from smallest to largest.
2
Pick
Select the value at position (n+1)/2 (the median).
3
Interpret
The median is the (n+1)/2-th Order Statistic. Unlike the mean, it is robust to outliers—one billionaire in a room of poor people won't change the median income much, but they will skyrocket the mean.

← Previous

Evaluating Estimators

Course Progression

21 of 25

Hypothesis Testing & Confidence Intervals