Hypothesis Testing & Confidence Intervals

Confidence Intervals: The Fishing Net

When we estimate a parameter like the population mean ( $\mu$ ), we don't just want a single "best guess"—we want a range of plausible values. A Confidence Interval (CI) is like a fishing net we throw into the ocean to catch the "true" parameter.

✦Intuition

What '95% Confidence' Really Means

Imagine you repeat an experiment 100 times, and each time you calculate a 95% Confidence Interval. 95 of those intervals will contain the true parameter, and 5 will miss it.

Crucially, the "true" parameter ( $\mu$ ) is a fixed constant; it doesn't move. The interval is what is random, because it depends on the specific sample you drew. You are essentially saying: "I used a process that is reliable 95% of the time."

The Formula for the Mean

When the population standard deviation ( $\sigma$ ) is known:

\text{95\% CI: } \quad \bar{X} \pm 1.96 \cdot \frac{\sigma}{\sqrt{n}}

The term $1.96 \cdot \sigma/\sqrt{n}$ is called the Margin of Error.

EExample

Political Polling

When a poll says a candidate has 48% support with a "margin of error of ±3%", they are giving you a confidence interval: $[45\%, 51\%]$ . They are 95% confident that the true population support falls within that range.

Hypothesis Testing: The Courtroom of Science

Hypothesis testing is a formal way of asking: "Is the effect I'm seeing real, or just a lucky coincidence?"

✦Intuition

The Courtroom Analogy

In a trial, the defendant is "Innocent until proven guilty."

Null Hypothesis ( $H_0$ ): The status quo. The defendant is innocent (No effect).
Alternative Hypothesis ( $H_A$ ): The claim we want to prove. The defendant is guilty (There is an effect).
Evidence: The data we collected.
Verdict: If the evidence is "beyond a reasonable doubt" (a low p-value), we reject $H_0$ and "convict" the null hypothesis.

The Decision Matrix

Every decision in statistics carries a risk of being wrong:

Decision	$H_0$ is Actually True	$H_0$ is Actually False
Reject $H_0$	Type I Error (False Alarm)	Correct Decision (Power)
Fail to Reject $H_0$	Correct Decision	Type II Error (Missed Opportunity)

EExample

A/B Testing in Tech

Imagine Netflix changes its 'Play' button from blue to red.

$H_0$ : The color change has no effect on click rates.
$H_A$ : The red button gets more clicks.
Type I Error: Netflix thinks red is better when it's actually not (wasted design effort).
Type II Error: Red is actually better, but the test was too small to notice it (lost revenue).

The Infamous P-Value

The p-value is the most misunderstood number in statistics. It is simply the answer to: "If the Null Hypothesis were true, how likely is it that I'd see a result this extreme?"

!Common pitfall

What a P-Value is NOT

It is not the probability that $H_0$ is true.
It is not the probability that your result was "due to chance."
A p-value of 0.05 does not mean there is a 5% chance the effect isn't real.

✦Intuition

The 'Surprise' Meter

Think of the p-value as a "Surprise Meter" on a scale from 0 to 1:

p = 0.50: "Not surprising at all. This happens all the time." (Fail to reject $H_0$ ).
p = 0.01: "Wait, that's very weird if there's truly no effect. Maybe something is going on." (Reject $H_0$ ).
p = 0.0001: "This is nearly impossible under the null. I'm almost certain there's an effect." (Strongly reject $H_0$ ).

Statistical Power: The Magnifying Glass

Power ( $1-\beta$ ) is the probability that your test will correctly find an effect if one actually exists.

✦Intuition

The Sensitivity of your Test

If you're looking for a tiny needle in a large haystack, you need a powerful magnifying glass. In statistics, "Power" is that magnifying glass. You get more power by:

Increasing sample size ( $n$ ): A bigger test is more sensitive.
Looking for a bigger effect: It's easier to find a giant needle than a tiny one.
Reducing noise: High-quality, consistent data makes the signal clearer.

The Coffee Shop Claim

medium

A coffee shop claims their 'Large' cups contain 500ml. You measure 25 cups and get an average of 490ml with a sample standard deviation $s=20$ ml. At a 5% significance level, are they under-filling?

← Previous

Sampling Distributions

Course Progression

22 of 25

Bayesian Inference