Skip to content
mentorship

concepts

Exponential family

A unified family of distributions (Gaussian, Bernoulli, Poisson, Beta, Gamma, etc.) with shared properties: sufficient statistics, conjugate priors, simple MLE.

Reviewed · 3 min read

One-line definition

A distribution is in the exponential family if its density / mass function can be written as:

with natural parameter , sufficient statistic , base measure , and log-partition (which normalizes).

Why it matters

Most distributions you use day-to-day are exponential family: Gaussian, Bernoulli, categorical, Poisson, Beta, Gamma, Dirichlet, geometric, exponential. Recognizing them as such gives you free results:

  • MLE is closed-form when the natural parameter is unconstrained: just match sample moments to model moments.
  • Conjugate priors exist and are themselves exponential family.
  • Sufficient statistics contain all the data’s information about . You can summarize a dataset by and forget the rest.
  • Generalized linear models (GLMs) are linear regression generalized to exponential-family responses.

The canonical form

Given the form above:

  • are the sufficient statistics (Bernoulli: ; Gaussian: ).
  • are the natural parameters (Bernoulli: , the logit; Gaussian: ).
  • is the log-partition function; gradient gives the mean, Hessian gives the covariance of :

This is why MLE via moment-matching works: the gradient of the log-likelihood is “data sufficient stat minus model expected sufficient stat.”

Common members

DistributionSufficient stat Natural parameter
Bernoulli() (logit)
Categorical()one-hot (log-probabilities)
Gaussian()
Poisson()
Beta()
Gamma()

Generalized linear models

A GLM combines a linear predictor with an exponential-family response distribution. The link function maps the linear predictor to the natural parameter:

ResponseGLMLink
Continuous (Gaussian)linear regressionidentity
Binary (Bernoulli)logistic regressionlogit
Count (Poisson)Poisson regressionlog
Categoricalmultinomial logisticsoftmax
Time-to-event (Exponential, Weibull)survival modelslog

Logistic regression is a GLM with Bernoulli response and logit link. This unifies the entire family of “regression-style” classifiers.

Properties to remember

  • Convexity: is convex in . So negative log-likelihood is convex, and there is a unique MLE.
  • Sufficient statistics: by Pitman-Koopman-Darmois theorem, exponential families are essentially the only distributions with finite-dimensional sufficient statistics independent of .
  • Conjugacy: exponential families have conjugate priors, also in the exponential family.
  • Maximum entropy: the exponential family with sufficient statistics matching given moments is the maximum-entropy distribution under those constraints.

Common pitfalls

  • Forgetting that the Cauchy distribution is not exponential family. Heavy tails break the sufficient-statistic property.
  • Confusing “exponential” (the distribution) with “exponential family” (the class). The Exp() distribution is one member.
  • Treating natural parameters as the same as canonical parameters. A Gaussian’s natural parameters are , not . Some software libraries default to one or the other; check.