Beyond Success and Failure

The Binomial distribution assumes exactly two possible outcomes: Success or Failure. But what if there are kk distinct, mutually exclusive outcomes? For example, rolling a 6-sided die, or categorizing voters into multiple political parties.

This is where the Multinomial Distribution comes in. It is a multivariate generalization of the Binomial distribution.

P(X1=x1,,Xk=xk)=n!x1!x2!xk!p1x1p2x2pkxkP(X_1 = x_1, \dots, X_k = x_k) = \frac{n!}{x_1! x_2! \dots x_k!} p_1^{x_1} p_2^{x_2} \dots p_k^{x_k}

Parameters

  • nn: Total number of independent trials.
  • kk: Number of mutually exclusive categories.
  • pip_i: Probability of outcome ii occurring in a single trial, where pi=1\sum p_i = 1.
  • XiX_i: Random variable representing the number of times outcome ii occurs.
  • xix_i: Observed count for category ii, where xi=n\sum x_i = n.
EExample
Real-World Examples
  • Genetics: Counting the number of offspring with different blood types (A, B, AB, O) given parental genetics.
  • Text Analysis: Modeling the frequency of words in a document (the "Bag of Words" model in NLP often assumes a multinomial distribution over the vocabulary).
  • Surveying: Asking 100 people their favorite season (Spring, Summer, Fall, Winter).

Core Properties

Unlike previous distributions, the Multinomial is a joint distribution of multiple random variables (X1,X2,,Xk)(X_1, X_2, \dots, X_k).

  • Marginal Mean (E[Xi]E[X_i]): npin p_i
  • Marginal Variance (Var(Xi)Var(X_i)): npi(1pi)n p_i (1-p_i)
  • Covariance (Cov(Xi,Xj)Cov(X_i, X_j)): npipj-n p_i p_j (for iji \neq j)
Intuition
Negative Covariance

Why is the covariance negative? Because the total number of trials nn is fixed. If category A happens more often, it must come at the expense of category B happening less often. They "compete" for the fixed nn spots.

Marginal Distributions

If you only care about one specific category and lump all the others into "everything else," the marginal distribution of XiX_i is simply a Binomial Distribution: XiBinomial(n,pi)X_i \sim \text{Binomial}(n, p_i).

Intuition
The Joint Nature of Multinomials

It's crucial to remember that the Multinomial distribution doesn't output a single number, but a vector of counts: (X1,X2,,Xk)(X_1, X_2, \dots, X_k). Any valid outcome must sum to the total number of trials nn. This strict constraint is what creates the negative covariance between different outcomes—a success in category A mathematically forces a 'not-A' in all other categories for that specific trial.

Advanced Practice

Example 1: Loaded Die

medium

You roll a loaded 4-sided die 10 times. The probabilities of rolling 1, 2, 3, and 4 are p1=0.1,p2=0.2,p3=0.3,p4=0.4p_1=0.1, p_2=0.2, p_3=0.3, p_4=0.4 respectively. What is the probability of rolling exactly two 1s, three 2s, one 3, and four 4s?

Example 2: Election Polling

hard

In a local election, 50% of voters favor Candidate A, 30% favor Candidate B, and 20% are undecided. If you survey 5 random voters, what is the probability that Candidate A and Candidate B tie in your sample, with exactly 2 votes each, and 1 voter being undecided?