In the Frequentist world, we assume there is one "True" fixed parameter () and we try to find it. In the Bayesian world, we accept that we have uncertainty about the parameter itself. We treat as a random variable with its own probability distribution.
Bayesian inference is a continuous cycle of learning:
- Prior (): What you believe about before seeing any new data.
- Likelihood (): How much the new data supports different possible values of .
- Posterior (): What you believe now, having mathematically combined your old beliefs with the new evidence.
The Master Formula
Everything in Bayesian statistics flows from Bayes' Theorem:
Or more simply: Posterior Likelihood Prior.
Bayesian Updating
The Prior (blue) was flat and uncertain. The Likelihood (green) shows where the data peaks. The Posterior (red) is the 'compromise'—it shifted our belief toward the data while still honoring our initial uncertainty.
Calculating the posterior usually requires solving a difficult integral (the denominator of Bayes' rule). However, for certain pairs of distributions, the math works out perfectly. These are called Conjugate Priors.
If you are flipping a coin (Binomial likelihood) and you use a Beta distribution as your prior, your posterior is guaranteed to be another Beta distribution. You simply add the number of heads to the parameter and the number of tails to the parameter. No calculus required!
| If your Likelihood is... | Use this Prior Family... | To get this Posterior |
|---|---|---|
| Binomial (Coin Flips, Clicks) | Beta | Beta (Just update and eta) |
| Poisson (Arrival Rates) | Gamma | Gamma (Just update count and time) |
| Normal (with known variance) | Normal | Normal (Update mean and precision) |
| Exponential (Wait times) | Gamma | Gamma |
If you must pick a single number from your posterior distribution, you pick the Mode (the peak). This is the Maximum A Posteriori (MAP) estimate.
MAP is essentially MLE but with a "penalty" from the prior. In Machine Learning, when we use Weight Decay or L2 Regularization, we are secretly being Bayesians! We are saying: "I want to fit the data, but my prior belief is that the weights shouldn't be too large."
Frequentist "Confidence Intervals" are notoriously hard to explain. Bayesian Credible Intervals are exactly what they sound like.
A 95% Credible Interval means: "Based on my prior and the data, there is a 95% probability that the true parameter falls in this range." This is much more intuitive than the frequentist version, where the probability of being in the interval is always either 0 or 1!
Updating Beliefs about a Coin
You start with a 'Flat' (Uniform) prior, Beta(1,1), meaning all biases are equally likely. You flip a coin 10 times and get 8 heads. What is your new belief?
The Bayesian Workflow
How beliefs evolve as evidence arrives.