Evaluating Estimators

How Good is Your Guess?

We now know how to estimate parameters using MoM or MLE, but how do we know which estimator is "best"? Is the Sample Mean better than the Sample Median? To answer this, we evaluate estimators ( $\hat{\theta}$ ) based on two fundamental criteria: Bias and Variance.

✦Intuition

The Archery Analogy

Imagine shooting arrows at a target where the bullseye is the true parameter $\theta$ :

Bias: How far is the average of all your shots from the bullseye? (Systematic Error/Accuracy)
Variance: How tight is the grouping of your shots? (Precision)
MSE: The overall quality of your shooting, accounting for both aim and consistency.

Scenario	Bias	Variance	Interpretation
Bullseye	Zero	Low	Your aim is true and your hands are steady.
Systematic Error	High	Low	Your hands are steady, but your sight is misaligned.
High Noise	Zero	High	Your aim is true on average, but your hands are shaky.
Chaos	High	High	Your sight is off AND your hands are shaky.

Mean Squared Error (MSE): The Total Error

The MSE is the most common metric for comparing estimators. It measures the average squared distance between the estimator and the true parameter.

\text{MSE}(\hat{\theta}) = E[(\hat{\theta} - \theta)^2]

One of the most important results in statistics is that MSE can be decomposed into two parts:

\text{MSE}(\hat{\theta}) = \text{Bias}^2(\hat{\theta}) + \text{Var}(\hat{\theta})

The Bias-Variance Tradeoff

In a perfect world, we want zero bias and zero variance. However, in the real world (and especially in Machine Learning), reducing one often increases the other.

EExample

The Overfitting Trap

A highly complex model (like a 100th-degree polynomial) can fit your specific data points perfectly (zero bias), but it will change wildly if you add even one new data point (high variance). A simpler model might miss some nuances (higher bias) but is much more stable and reliable (lower variance). This balance is the "holy grail" of data science.

Why is Sample Variance

n-1

If you've ever used a statistics calculator, you might have noticed that the Sample Variance formula ( $S^2$ ) divides by $n-1$ instead of $n$ . This is called Bessel's Correction.

✦Intuition

Correcting for 'Mean Drift'

If we divide by $n$ , our estimator for variance is systematically too small (it is biased). This happens because we don't know the true population mean ( $\mu$ ); we are forced to use the sample mean ( $\bar{X}$ ). Since $\bar{X}$ is calculated from the same data, the data points are naturally "closer" to it than they are to the true $\mu$ . Dividing by $n-1$ slightly inflates the result, making the estimator unbiased.

Efficiency: The 'Speed Limit' of Estimators

Some estimators are more "efficient" than others—they extract more information from the same amount of data. The Fisher Information ( $I(\theta)$ ) measures how much information each data point provides about the parameter.

TTheorem

The Cramér-Rao Lower Bound

There is a mathematical "speed limit" on how low the variance of an unbiased estimator can be. You can never do better than: $Var(\hat{\theta}) \ge \frac{1}{n \cdot I(\theta)}$

An estimator that hits this limit (like many MLEs) is called Efficient.

Efficiency Comparison

As we gather more data (n), the variance of all estimators drops toward zero. The MLE (blue) stays at the absolute theoretical floor, while others (green) are 'wasteful' and require more data to reach the same precision.

Cramér-Rao Limit (Floor)MLE (Efficient)MoM (Less Efficient)

← Previous

Point Estimation

Course Progression

20 of 25

Sampling Distributions