One-line definition
A distribution is in the exponential family if its density / mass function can be written as:
with natural parameter , sufficient statistic , base measure , and log-partition (which normalizes).
Why it matters
Most distributions you use day-to-day are exponential family: Gaussian, Bernoulli, categorical, Poisson, Beta, Gamma, Dirichlet, geometric, exponential. Recognizing them as such gives you free results:
- MLE is closed-form when the natural parameter is unconstrained: just match sample moments to model moments.
- Conjugate priors exist and are themselves exponential family.
- Sufficient statistics contain all the data’s information about . You can summarize a dataset by and forget the rest.
- Generalized linear models (GLMs) are linear regression generalized to exponential-family responses.
The canonical form
Given the form above:
- are the sufficient statistics (Bernoulli: ; Gaussian: ).
- are the natural parameters (Bernoulli: , the logit; Gaussian: ).
- is the log-partition function; gradient gives the mean, Hessian gives the covariance of :
This is why MLE via moment-matching works: the gradient of the log-likelihood is “data sufficient stat minus model expected sufficient stat.”
Common members
| Distribution | Sufficient stat | Natural parameter |
|---|---|---|
| Bernoulli() | (logit) | |
| Categorical() | one-hot | (log-probabilities) |
| Gaussian() | ||
| Poisson() | ||
| Beta() | ||
| Gamma() |
Generalized linear models
A GLM combines a linear predictor with an exponential-family response distribution. The link function maps the linear predictor to the natural parameter:
| Response | GLM | Link |
|---|---|---|
| Continuous (Gaussian) | linear regression | identity |
| Binary (Bernoulli) | logistic regression | logit |
| Count (Poisson) | Poisson regression | log |
| Categorical | multinomial logistic | softmax |
| Time-to-event (Exponential, Weibull) | survival models | log |
Logistic regression is a GLM with Bernoulli response and logit link. This unifies the entire family of “regression-style” classifiers.
Properties to remember
- Convexity: is convex in . So negative log-likelihood is convex, and there is a unique MLE.
- Sufficient statistics: by Pitman-Koopman-Darmois theorem, exponential families are essentially the only distributions with finite-dimensional sufficient statistics independent of .
- Conjugacy: exponential families have conjugate priors, also in the exponential family.
- Maximum entropy: the exponential family with sufficient statistics matching given moments is the maximum-entropy distribution under those constraints.
Common pitfalls
- Forgetting that the Cauchy distribution is not exponential family. Heavy tails break the sufficient-statistic property.
- Confusing “exponential” (the distribution) with “exponential family” (the class). The Exp() distribution is one member.
- Treating natural parameters as the same as canonical parameters. A Gaussian’s natural parameters are , not . Some software libraries default to one or the other; check.