The Bayesian Revolution

In the Frequentist world, we assume there is one "True" fixed parameter (θ\theta) and we try to find it. In the Bayesian world, we accept that we have uncertainty about the parameter itself. We treat θ\theta as a random variable with its own probability distribution.

Intuition
The Learning Loop

Bayesian inference is a continuous cycle of learning:

  1. Prior (π(θ)\pi(\theta)): What you believe about θ\theta before seeing any new data.
  2. Likelihood (f(xθ)f(x \mid \theta)): How much the new data supports different possible values of θ\theta.
  3. Posterior (π(θx)\pi(\theta \mid x)): What you believe now, having mathematically combined your old beliefs with the new evidence.

The Master Formula

Everything in Bayesian statistics flows from Bayes' Theorem:

π(θx)=f(xθ)π(θ)f(xθ)π(θ)dθ\pi(\theta \mid x) = \frac{f(x \mid \theta) \pi(\theta)}{\int f(x \mid \theta') \pi(\theta') d\theta'}

Or more simply: Posterior \propto Likelihood ×\times Prior.

Bayesian Updating

The Prior (blue) was flat and uncertain. The Likelihood (green) shows where the data peaks. The Posterior (red) is the 'compromise'—it shifted our belief toward the data while still honoring our initial uncertainty.

-0.350.71.752.83.85-0.020.1880.3960.6040.8121.02xy
Prior (I don't know)Likelihood (Data Peaks at 0.6)Posterior (New Belief)
Conjugate Priors: Mathematical Shortcuts

Calculating the posterior usually requires solving a difficult integral (the denominator of Bayes' rule). However, for certain pairs of distributions, the math works out perfectly. These are called Conjugate Priors.

EExample
The Beta-Binomial Shortcut

If you are flipping a coin (Binomial likelihood) and you use a Beta distribution as your prior, your posterior is guaranteed to be another Beta distribution. You simply add the number of heads to the α\alpha parameter and the number of tails to the β\beta parameter. No calculus required!

If your Likelihood is...Use this Prior Family...To get this Posterior
Binomial (Coin Flips, Clicks)BetaBeta (Just update alphaalpha and eta)
Poisson (Arrival Rates)GammaGamma (Just update count and time)
Normal (with known variance)NormalNormal (Update mean and precision)
Exponential (Wait times)GammaGamma
MAP Estimation: The Bayesian Point Estimate

If you must pick a single number from your posterior distribution, you pick the Mode (the peak). This is the Maximum A Posteriori (MAP) estimate.

Intuition
Regularized Fit

MAP is essentially MLE but with a "penalty" from the prior. In Machine Learning, when we use Weight Decay or L2 Regularization, we are secretly being Bayesians! We are saying: "I want to fit the data, but my prior belief is that the weights shouldn't be too large."

Credible Intervals: What we actually want

Frequentist "Confidence Intervals" are notoriously hard to explain. Bayesian Credible Intervals are exactly what they sound like.

Intuition
The Common Sense Interval

A 95% Credible Interval means: "Based on my prior and the data, there is a 95% probability that the true parameter falls in this range." This is much more intuitive than the frequentist version, where the probability of being in the interval is always either 0 or 1!

Updating Beliefs about a Coin

medium

You start with a 'Flat' (Uniform) prior, Beta(1,1), meaning all biases are equally likely. You flip a coin 10 times and get 8 heads. What is your new belief?

The Bayesian Workflow

How beliefs evolve as evidence arrives.

Prior BeliefBeta(α, β)Observed EvidenceSuccesses/FailuresLikelihoodBinomial / NormalUpdated BeliefUpdated BetaDecision / IntervalCredible Interval