The Hypergeometric Distribution is the cousin of the Binomial distribution. It applies when we are sampling without replacement from a finite population. Because the population is finite and we don't replace the items, each draw changes the probability for the next draw. The trials are dependent.
Parameters
- : Total population size.
- : Number of success states in the population.
- : Number of draws (sample size).
- : Number of observed successes.
The denominator is the total number of ways to pick items from the population of . The numerator breaks this into two parts: picking successes out of the available successes, and picking the remaining failures from the available failures.
Core Properties
- Mean ():
- Variance ():
Notice that the mean is identical to the Binomial mean if we define . The variance is also similar, but multiplied by a finite population correction factor: .
- Poker Hands: Drawing 5 cards from a 52-card deck and counting the number of hearts. The deck is finite, and drawn cards aren't replaced.
- Ecology: Capture-recapture methods for estimating animal populations.
- Quality Assurance: Testing 10 lightbulbs from a shipment of 50 to see if any are broken.
Why does sampling without replacement matter? Imagine an urn with 5 Red and 5 Blue marbles. The chance of drawing Red first is 5/10 (50%). But if you do draw Red and don't put it back, the urn now has 4 Red and 5 Blue. The chance of Red on the second draw drops to 4/9 (~44%). The Hypergeometric distribution naturally accounts for this shifting probability landscape, whereas the Binomial distribution assumes the urn is magically refilled after every draw.
Advanced Practice
Example 1: The Royal Flush
You draw 5 cards from a standard shuffled deck of 52 cards. What is the probability that you draw exactly 2 Aces?
Example 2: Audit Pass Rate
An auditor selects 4 tax returns randomly from a batch of 20 to check for errors. Unknown to the auditor, 6 of the returns in the batch contain errors. What is the probability that the auditor finds at least one return with an error?
The 5% Rule of Independence
When sampling without replacement, the trials are technically dependent, and the Hypergeometric distribution must be used. However, if the sample size () is less than or equal to 5% of the total population size (), the change in probabilities from one draw to the next is so incredibly small that it can be practically ignored.
In this case, we can treat the selections as being independent and use the much simpler Binomial distribution as an approximation. (This is mathematically identical to saying ).
Rule: If , you can safely treat the trials as independent.
Example 3: The 5% Shortcut
A factory produces a batch of 10,000 microchips, and 200 of them are defective. If you randomly sample 50 chips without replacement, what is the approximate probability of finding exactly 2 defective chips?