Bayesian Inference

The Bayesian Revolution

In the Frequentist world, we assume there is one "True" fixed parameter ( $\theta$ ) and we try to find it. In the Bayesian world, we accept that we have uncertainty about the parameter itself. We treat $\theta$ as a random variable with its own probability distribution.

✦Intuition

The Learning Loop

Bayesian inference is a continuous cycle of learning:

Prior ( $\pi(\theta)$ ): What you believe about $\theta$ before seeing any new data.
Likelihood ( $f(x \mid \theta)$ ): How much the new data supports different possible values of $\theta$ .
Posterior ( $\pi(\theta \mid x)$ ): What you believe now, having mathematically combined your old beliefs with the new evidence.

The Master Formula

Everything in Bayesian statistics flows from Bayes' Theorem:

\pi(\theta \mid x) = \frac{f(x \mid \theta) \pi(\theta)}{\int f(x \mid \theta') \pi(\theta') d\theta'}

Or more simply: Posterior $\propto$ Likelihood $\times$ Prior.

Bayesian Updating

The Prior (blue) was flat and uncertain. The Likelihood (green) shows where the data peaks. The Posterior (red) is the 'compromise'—it shifted our belief toward the data while still honoring our initial uncertainty.

Prior (I don't know)Likelihood (Data Peaks at 0.6)Posterior (New Belief)

Conjugate Priors: Mathematical Shortcuts

Calculating the posterior usually requires solving a difficult integral (the denominator of Bayes' rule). However, for certain pairs of distributions, the math works out perfectly. These are called Conjugate Priors.

EExample

The Beta-Binomial Shortcut

If you are flipping a coin (Binomial likelihood) and you use a Beta distribution as your prior, your posterior is guaranteed to be another Beta distribution. You simply add the number of heads to the $\alpha$ parameter and the number of tails to the $\beta$ parameter. No calculus required!

If your Likelihood is...	Use this Prior Family...	To get this Posterior
Binomial (Coin Flips, Clicks)	Beta	Beta (Just update $alpha$ and $eta$ )
Poisson (Arrival Rates)	Gamma	Gamma (Just update count and time)
Normal (with known variance)	Normal	Normal (Update mean and precision)
Exponential (Wait times)	Gamma	Gamma

MAP Estimation: The Bayesian Point Estimate

If you must pick a single number from your posterior distribution, you pick the Mode (the peak). This is the Maximum A Posteriori (MAP) estimate.

✦Intuition

Regularized Fit

MAP is essentially MLE but with a "penalty" from the prior. In Machine Learning, when we use Weight Decay or L2 Regularization, we are secretly being Bayesians! We are saying: "I want to fit the data, but my prior belief is that the weights shouldn't be too large."

Credible Intervals: What we actually want

Frequentist "Confidence Intervals" are notoriously hard to explain. Bayesian Credible Intervals are exactly what they sound like.

✦Intuition

The Common Sense Interval

A 95% Credible Interval means: "Based on my prior and the data, there is a 95% probability that the true parameter falls in this range." This is much more intuitive than the frequentist version, where the probability of being in the interval is always either 0 or 1!

Updating Beliefs about a Coin

medium

You start with a 'Flat' (Uniform) prior, Beta(1,1), meaning all biases are equally likely. You flip a coin 10 times and get 8 heads. What is your new belief?

The Bayesian Workflow

How beliefs evolve as evidence arrives.

← Previous

Hypothesis Testing & Confidence Intervals

Course Progression

23 of 25

Stochastic Processes