When we estimate a parameter like the population mean (), we don't just want a single "best guess"—we want a range of plausible values. A Confidence Interval (CI) is like a fishing net we throw into the ocean to catch the "true" parameter.
Imagine you repeat an experiment 100 times, and each time you calculate a 95% Confidence Interval. 95 of those intervals will contain the true parameter, and 5 will miss it.
Crucially, the "true" parameter () is a fixed constant; it doesn't move. The interval is what is random, because it depends on the specific sample you drew. You are essentially saying: "I used a process that is reliable 95% of the time."
The Formula for the Mean
When the population standard deviation () is known:
The term is called the Margin of Error.
When a poll says a candidate has 48% support with a "margin of error of ±3%", they are giving you a confidence interval: . They are 95% confident that the true population support falls within that range.
Hypothesis testing is a formal way of asking: "Is the effect I'm seeing real, or just a lucky coincidence?"
In a trial, the defendant is "Innocent until proven guilty."
- Null Hypothesis (): The status quo. The defendant is innocent (No effect).
- Alternative Hypothesis (): The claim we want to prove. The defendant is guilty (There is an effect).
- Evidence: The data we collected.
- Verdict: If the evidence is "beyond a reasonable doubt" (a low p-value), we reject and "convict" the null hypothesis.
The Decision Matrix
Every decision in statistics carries a risk of being wrong:
| Decision | $H_0$ is Actually True | $H_0$ is Actually False |
|---|---|---|
| Reject | **Type I Error (False Alarm)** | Correct Decision (**Power**) |
| Fail to Reject | Correct Decision | **Type II Error (Missed Opportunity)** |
Imagine Netflix changes its 'Play' button from blue to red.
- : The color change has no effect on click rates.
- : The red button gets more clicks.
- Type I Error: Netflix thinks red is better when it's actually not (wasted design effort).
- Type II Error: Red is actually better, but the test was too small to notice it (lost revenue).
The p-value is the most misunderstood number in statistics. It is simply the answer to: "If the Null Hypothesis were true, how likely is it that I'd see a result this extreme?"
- It is not the probability that is true.
- It is not the probability that your result was "due to chance."
- A p-value of 0.05 does not mean there is a 5% chance the effect isn't real.
Think of the p-value as a "Surprise Meter" on a scale from 0 to 1:
- p = 0.50: "Not surprising at all. This happens all the time." (Fail to reject ).
- p = 0.01: "Wait, that's very weird if there's truly no effect. Maybe something is going on." (Reject ).
- p = 0.0001: "This is nearly impossible under the null. I'm almost certain there's an effect." (Strongly reject ).
Power () is the probability that your test will correctly find an effect if one actually exists.
If you're looking for a tiny needle in a large haystack, you need a powerful magnifying glass. In statistics, "Power" is that magnifying glass. You get more power by:
- Increasing sample size (): A bigger test is more sensitive.
- Looking for a bigger effect: It's easier to find a giant needle than a tiny one.
- Reducing noise: High-quality, consistent data makes the signal clearer.
The Coffee Shop Claim
A coffee shop claims their 'Large' cups contain 500ml. You measure 25 cups and get an average of 490ml with a sample standard deviation ml. At a 5% significance level, are they under-filling?