Exponential family

A unified family of distributions (Gaussian, Bernoulli, Poisson, Beta, Gamma, etc.) with shared properties: sufficient statistics, conjugate priors, simple MLE.

Reviewed December 3, 2025 · 3 min read

One-line definition

A distribution is in the exponential family if its density / mass function can be written as:

p (x ∣ θ) = h (x) exp (η (θ)^{⊤} T (x) - A (θ))

with natural parameter $η (θ)$ , sufficient statistic $T (x)$ , base measure $h (x)$ , and log-partition $A (θ)$ (which normalizes).

Why it matters

Most distributions you use day-to-day are exponential family: Gaussian, Bernoulli, categorical, Poisson, Beta, Gamma, Dirichlet, geometric, exponential. Recognizing them as such gives you free results:

MLE is closed-form when the natural parameter is unconstrained: just match sample moments to model moments.
Conjugate priors exist and are themselves exponential family.
Sufficient statistics $T (x)$ contain all the data’s information about $θ$ . You can summarize a dataset by $\sum_{i} T (x_{i})$ and forget the rest.
Generalized linear models (GLMs) are linear regression generalized to exponential-family responses.

The canonical form

Given the form above:

$T (x)$ are the sufficient statistics (Bernoulli: $T (x) = x$ ; Gaussian: $T (x) = (x, x^{2})$ ).
$η (θ)$ are the natural parameters (Bernoulli: $η = lo g \frac{p}{1 - p}$ , the logit; Gaussian: $η = (μ / σ^{2}, - 1/ (2 σ^{2}))$ ).
$A (θ)$ is the log-partition function; gradient gives the mean, Hessian gives the covariance of $T (x)$ :

\nabla A (θ) = E [T (X)], \nabla^{2} A (θ) = Cov (T (X)) .

This is why MLE via moment-matching works: the gradient of the log-likelihood is “data sufficient stat minus model expected sufficient stat.”

Common members

Distribution	Sufficient stat $T (x)$	Natural parameter $η$
Bernoulli( $p$ )	$x$	$lo g (p / (1 - p))$ (logit)
Categorical( $π$ )	one-hot $(x)$	$lo g π$ (log-probabilities)
Gaussian( $μ, σ^{2}$ )	$(x, x^{2})$	$(μ / σ^{2}, - 1/ (2 σ^{2}))$
Poisson( $λ$ )	$x$	$lo g λ$
Beta( $α, β$ )	$(lo g x, lo g (1 - x))$	$(α - 1, β - 1)$
Gamma( $α, β$ )	$(lo g x, x)$	$(α - 1, - β)$

Generalized linear models

A GLM combines a linear predictor $η = X β$ with an exponential-family response distribution. The link function maps the linear predictor to the natural parameter:

Response	GLM	Link
Continuous (Gaussian)	linear regression	identity
Binary (Bernoulli)	logistic regression	logit
Count (Poisson)	Poisson regression	log
Categorical	multinomial logistic	softmax
Time-to-event (Exponential, Weibull)	survival models	log

Logistic regression is a GLM with Bernoulli response and logit link. This unifies the entire family of “regression-style” classifiers.

Properties to remember

Convexity: $A (η)$ is convex in $η$ . So negative log-likelihood is convex, and there is a unique MLE.
Sufficient statistics: by Pitman-Koopman-Darmois theorem, exponential families are essentially the only distributions with finite-dimensional sufficient statistics independent of $n$ .
Conjugacy: exponential families have conjugate priors, also in the exponential family.
Maximum entropy: the exponential family with sufficient statistics $T$ matching given moments is the maximum-entropy distribution under those constraints.

Common pitfalls

Forgetting that the Cauchy distribution is not exponential family. Heavy tails break the sufficient-statistic property.
Confusing “exponential” (the distribution) with “exponential family” (the class). The Exp( $λ$ ) distribution is one member.
Treating natural parameters as the same as canonical parameters. A Gaussian’s natural parameters are $(μ / σ^{2}, - 1/ (2 σ^{2}))$ , not $(μ, σ^{2})$ . Some software libraries default to one or the other; check.