Sampling Without Replacement

The Hypergeometric Distribution is the cousin of the Binomial distribution. It applies when we are sampling without replacement from a finite population. Because the population is finite and we don't replace the items, each draw changes the probability for the next draw. The trials are dependent.

P(X=k)=(Kk)(NKnk)(Nn)P(X = k) = \frac{\binom{K}{k} \binom{N-K}{n-k}}{\binom{N}{n}}

Parameters

  • NN: Total population size.
  • KK: Number of success states in the population.
  • nn: Number of draws (sample size).
  • kk: Number of observed successes.
Intuition
Understanding the Formula

The denominator (Nn)\binom{N}{n} is the total number of ways to pick nn items from the population of NN. The numerator breaks this into two parts: picking kk successes out of the available KK successes, and picking the remaining nkn-k failures from the available NKN-K failures.

Core Properties

  • Mean (E[X]E[X]): nKNn \frac{K}{N}
  • Variance (Var(X)Var(X)): nKNNKNNnN1n \frac{K}{N} \frac{N-K}{N} \frac{N-n}{N-1}

Notice that the mean is identical to the Binomial mean if we define p=K/Np = K/N. The variance is also similar, but multiplied by a finite population correction factor: NnN1\frac{N-n}{N-1}.

EExample
Real-World Examples
  • Poker Hands: Drawing 5 cards from a 52-card deck and counting the number of hearts. The deck is finite, and drawn cards aren't replaced.
  • Ecology: Capture-recapture methods for estimating animal populations.
  • Quality Assurance: Testing 10 lightbulbs from a shipment of 50 to see if any are broken.
Intuition
The Impact of Depletion

Why does sampling without replacement matter? Imagine an urn with 5 Red and 5 Blue marbles. The chance of drawing Red first is 5/10 (50%). But if you do draw Red and don't put it back, the urn now has 4 Red and 5 Blue. The chance of Red on the second draw drops to 4/9 (~44%). The Hypergeometric distribution naturally accounts for this shifting probability landscape, whereas the Binomial distribution assumes the urn is magically refilled after every draw.

Advanced Practice

Example 1: The Royal Flush

medium

You draw 5 cards from a standard shuffled deck of 52 cards. What is the probability that you draw exactly 2 Aces?

Example 2: Audit Pass Rate

hard

An auditor selects 4 tax returns randomly from a batch of 20 to check for errors. Unknown to the auditor, 6 of the returns in the batch contain errors. What is the probability that the auditor finds at least one return with an error?

The 5% Rule of Independence

When sampling without replacement, the trials are technically dependent, and the Hypergeometric distribution must be used. However, if the sample size (nn) is less than or equal to 5% of the total population size (NN), the change in probabilities from one draw to the next is so incredibly small that it can be practically ignored.

In this case, we can treat the selections as being independent and use the much simpler Binomial distribution as an approximation. (This is mathematically identical to saying N>20nN > 20n).

Rule: If n0.05Nn \le 0.05N, you can safely treat the trials as independent.

Example 3: The 5% Shortcut

medium

A factory produces a batch of 10,000 microchips, and 200 of them are defective. If you randomly sample 50 chips without replacement, what is the approximate probability of finding exactly 2 defective chips?