One-line definition
For an estimator of a parameter , bias is and variance is . Mean-squared error decomposes as .
Why it matters
This decomposition is the statistical version of the bias–variance tradeoff familiar from ML: complex models have low bias but high variance; simple models have high bias but low variance. The same accounting applies to any estimator. Sample mean, regularized regression coefficient, importance sampling weight.
The decomposition
For estimator of a fixed (non-random) parameter :
The cross-term vanishes because . Two-line derivation; central to all of statistics.
Why biased estimators can be useful
Unbiased estimators () are not always optimal. A biased estimator with much lower variance can have lower MSE.
Examples:
| Estimator | Bias | Variance | When better |
|---|---|---|---|
| Sample mean | 0 | universal | |
| Sample variance with | 0 | larger | unbiased baseline |
| Sample variance with (MLE) | small negative | smaller | when minimizing MSE |
| Ridge regression | nonzero | smaller than OLS | when is ill-conditioned |
| Stein estimator | shrinkage bias | strictly lower | always for dimensions |
The James-Stein estimator (1961) famously dominates the sample mean in dimensions despite being biased.
Connection to ML model selection
In supervised learning, the same decomposition holds for the prediction error of a model:
The term is irreducible noise. Increasing model capacity decreases bias but increases variance; regularization shifts the tradeoff toward higher bias.
Common pitfalls
- Equating “biased” with “bad.” Many useful estimators are biased; lower MSE is what matters.
- Reporting variance without specifying what’s random. “Variance of the estimator” is over re-sampling the data; “variance of the prediction” is over both data and inputs. Different objects.
- Forgetting that the cross-term vanishes only against the expectation of . Random fixed offsets ruin the decomposition.
- Confusing estimator variance with model variance. Estimator variance is a property of the estimation procedure; model variance is a property of the model class.