Confidence Intervals: The Fishing Net

When we estimate a parameter like the population mean (μ\mu), we don't just want a single "best guess"—we want a range of plausible values. A Confidence Interval (CI) is like a fishing net we throw into the ocean to catch the "true" parameter.

Intuition
What '95% Confidence' Really Means

Imagine you repeat an experiment 100 times, and each time you calculate a 95% Confidence Interval. 95 of those intervals will contain the true parameter, and 5 will miss it.

Crucially, the "true" parameter (μ\mu) is a fixed constant; it doesn't move. The interval is what is random, because it depends on the specific sample you drew. You are essentially saying: "I used a process that is reliable 95% of the time."

The Formula for the Mean

When the population standard deviation (σ\sigma) is known:

95% CI: Xˉ±1.96σn\text{95\% CI: } \quad \bar{X} \pm 1.96 \cdot \frac{\sigma}{\sqrt{n}}

The term 1.96σ/n1.96 \cdot \sigma/\sqrt{n} is called the Margin of Error.

EExample
Political Polling

When a poll says a candidate has 48% support with a "margin of error of ±3%", they are giving you a confidence interval: [45%,51%][45\%, 51\%]. They are 95% confident that the true population support falls within that range.

Hypothesis Testing: The Courtroom of Science

Hypothesis testing is a formal way of asking: "Is the effect I'm seeing real, or just a lucky coincidence?"

Intuition
The Courtroom Analogy

In a trial, the defendant is "Innocent until proven guilty."

  • Null Hypothesis (H0H_0): The status quo. The defendant is innocent (No effect).
  • Alternative Hypothesis (HAH_A): The claim we want to prove. The defendant is guilty (There is an effect).
  • Evidence: The data we collected.
  • Verdict: If the evidence is "beyond a reasonable doubt" (a low p-value), we reject H0H_0 and "convict" the null hypothesis.

The Decision Matrix

Every decision in statistics carries a risk of being wrong:

Decision$H_0$ is Actually True$H_0$ is Actually False
Reject H0H_0**Type I Error (False Alarm)**Correct Decision (**Power**)
Fail to Reject H0H_0Correct Decision**Type II Error (Missed Opportunity)**
EExample
A/B Testing in Tech

Imagine Netflix changes its 'Play' button from blue to red.

  • H0H_0: The color change has no effect on click rates.
  • HAH_A: The red button gets more clicks.
  • Type I Error: Netflix thinks red is better when it's actually not (wasted design effort).
  • Type II Error: Red is actually better, but the test was too small to notice it (lost revenue).
The Infamous P-Value

The p-value is the most misunderstood number in statistics. It is simply the answer to: "If the Null Hypothesis were true, how likely is it that I'd see a result this extreme?"

!Common pitfall
What a P-Value is NOT
  • It is not the probability that H0H_0 is true.
  • It is not the probability that your result was "due to chance."
  • A p-value of 0.05 does not mean there is a 5% chance the effect isn't real.
Intuition
The 'Surprise' Meter

Think of the p-value as a "Surprise Meter" on a scale from 0 to 1:

  • p = 0.50: "Not surprising at all. This happens all the time." (Fail to reject H0H_0).
  • p = 0.01: "Wait, that's very weird if there's truly no effect. Maybe something is going on." (Reject H0H_0).
  • p = 0.0001: "This is nearly impossible under the null. I'm almost certain there's an effect." (Strongly reject H0H_0).
Statistical Power: The Magnifying Glass

Power (1β1-\beta) is the probability that your test will correctly find an effect if one actually exists.

Intuition
The Sensitivity of your Test

If you're looking for a tiny needle in a large haystack, you need a powerful magnifying glass. In statistics, "Power" is that magnifying glass. You get more power by:

  1. Increasing sample size (nn): A bigger test is more sensitive.
  2. Looking for a bigger effect: It's easier to find a giant needle than a tiny one.
  3. Reducing noise: High-quality, consistent data makes the signal clearer.

The Coffee Shop Claim

medium

A coffee shop claims their 'Large' cups contain 500ml. You measure 25 cups and get an average of 490ml with a sample standard deviation s=20s=20ml. At a 5% significance level, are they under-filling?