Skip to content
mentorship

concepts

Bias and variance of estimators

An estimator has bias (systematic error) and variance (sample-to-sample wobble). Mean-squared error decomposes into the two.

Reviewed · 2 min read

One-line definition

For an estimator of a parameter , bias is and variance is . Mean-squared error decomposes as .

Why it matters

This decomposition is the statistical version of the bias–variance tradeoff familiar from ML: complex models have low bias but high variance; simple models have high bias but low variance. The same accounting applies to any estimator. Sample mean, regularized regression coefficient, importance sampling weight.

The decomposition

For estimator of a fixed (non-random) parameter :

The cross-term vanishes because . Two-line derivation; central to all of statistics.

Why biased estimators can be useful

Unbiased estimators () are not always optimal. A biased estimator with much lower variance can have lower MSE.

Examples:

EstimatorBiasVarianceWhen better
Sample mean0universal
Sample variance with 0largerunbiased baseline
Sample variance with (MLE)small negativesmallerwhen minimizing MSE
Ridge regressionnonzerosmaller than OLSwhen is ill-conditioned
Stein estimatorshrinkage biasstrictly loweralways for dimensions

The James-Stein estimator (1961) famously dominates the sample mean in dimensions despite being biased.

Connection to ML model selection

In supervised learning, the same decomposition holds for the prediction error of a model:

The term is irreducible noise. Increasing model capacity decreases bias but increases variance; regularization shifts the tradeoff toward higher bias.

Common pitfalls

  • Equating “biased” with “bad.” Many useful estimators are biased; lower MSE is what matters.
  • Reporting variance without specifying what’s random. “Variance of the estimator” is over re-sampling the data; “variance of the prediction” is over both data and inputs. Different objects.
  • Forgetting that the cross-term vanishes only against the expectation of . Random fixed offsets ruin the decomposition.
  • Confusing estimator variance with model variance. Estimator variance is a property of the estimation procedure; model variance is a property of the model class.