Point Estimation

Two Ways to See the World

In the previous chapters, we knew the parameters (like $p$ or $\lambda$ ) and predicted the data. In Estimation, we do the opposite: we have the data, and we need to guess the parameters that created it.

There are two main schools of thought:

DDefinition

Frequentist vs. Bayesian

Frequentist: Parameters ( $\theta$ ) are fixed, unknown constants. If you flip a coin, it has one true bias $p$ . We try to find the "best" single value for $p$ based on many repeated experiments.
Bayesian: Parameters ( $\theta$ ) are random variables themselves. We have an initial "prior belief" about $p$ , and we update that belief to a "posterior distribution" as we observe more data.

This chapter focuses on the Frequentist goal: finding a single "best guess" (a point estimate, $\hat{\theta}$ ) for an unknown parameter.

Method of Moments (MoM): The Simple Match

The Method of Moments is the most intuitive estimation technique: you simply match the averages you see in your sample to the theoretical averages of the mathematical model.

EExample

MoM Intuition

If you know that, on average, a bus arrives every $\lambda$ minutes (theoretical mean), and you observe 5 buses with an average wait time of 10 minutes (sample mean), your "MoM" estimate is that $\lambda = 10$ . You are literally just matching the Sample Mean ( $\bar{X}$ ) to the Theoretical Mean ( $E[X]$ ).

The Formula

To estimate $\theta$ , we solve the equation:

\text{Sample Mean: } \bar{X} = \frac{1}{n}\sum_{i=1}^n X_i \quad \longleftrightarrow \quad E[X] = \mu(\theta)

Estimating Customer Arrivals

medium

You track the time (in minutes) between customers entering a shop: [2, 5, 1, 3, 4]. Assuming an Exponential distribution ( $E[X] = 1/\lambda$ ), what is your MoM estimate for the arrival rate λ?

Maximum Likelihood Estimation (MLE): The Best Fit

MLE is the gold standard of estimation. It asks: "Out of all possible values for $\theta$ , which one makes the specific data I actually saw the most likely to have occurred?"

✦Intuition

The Likelihood Peak

Imagine you flip a coin 10 times and get 9 heads.

Could the true bias be $p=0.5$ ? Yes, but it's very unlikely you'd get 9 heads.
Could it be $p=0.9$ ? Yes, that's very likely!
Could it be $p=0.1$ ? It's almost impossible. MLE chooses $p=0.9$ because it "maximizes the likelihood" (the probability) of the observed data.

The Likelihood Function

We define the Likelihood $\mathcal{L}(\theta \mid x_1, \ldots, x_n)$ as the joint probability of the data points given $\theta$ :

\mathcal{L}(\theta \mid x) = \prod_{i=1}^n f(x_i \mid \theta)

Likelihood Curve for Binomial p

This curve shows the likelihood of seeing our data for every possible value of p from 0 to 1. The highest point (the peak) is our MLE estimate.

L(p | observed heads)

EExample

The German Tank Problem

In WWII, the Allies used MLE to estimate how many tanks the Germans were producing. By looking at the serial numbers on captured tanks, they calculated the most likely "maximum serial number." Their statistical estimates (which said production was ~250/month) were far more accurate than intelligence reports from spies (which said ~1,000/month)!

MLE Workflow

Input

Observed sample data and a model (e.g., Normal, Poisson)

Output

The best-fit parameter estimate θ-hat

Complexity

1
Write Likelihood
Form the product of the probabilities for every data point: L(theta) = product f(xi | theta).
2
Log-Likelihood
Take the natural log of L(theta). Sums are much easier to differentiate than products!
3
Maximize
Take the derivative with respect to theta, set it to zero, and solve.
4
Verify
Ensure the second derivative is negative to confirm you found a maximum, not a minimum.

← Previous

Limit Theorems & Bounds

Course Progression

19 of 25

Evaluating Estimators