Skip to content
mentorship

concepts

Normalizing flows

Generative models built from invertible transformations. Compute exact likelihoods and sample efficiently. At the cost of architectural restrictions.

Reviewed · 3 min read

One-line definition

A normalizing flow transforms a simple base distribution (typically standard Gaussian) into a target distribution through a sequence of invertible, differentiable mappings . The change-of-variables formula gives exact log-likelihood:

Why it matters

Flows are the only family of deep generative models that simultaneously offer:

  • Exact likelihoods (unlike VAEs and diffusion, which give bounds).
  • Efficient sampling (single forward pass, unlike diffusion’s iterative denoise).
  • Tractable posterior (the inverse function gives the exact for any ).

The architectural restriction (each layer must be invertible with a tractable Jacobian) limits expressiveness. Flows have been displaced by diffusion for high-fidelity image generation but remain useful for likelihood-critical applications: density estimation, anomaly detection, simulation-based inference, molecular generation.

The change-of-variables formula

For an invertible with :

For a composition of flows, the log-determinant decomposes additively:

The engineering challenge: design each to be (a) invertible, (b) expressive, and (c) have a cheap-to-compute log-determinant.

Common flow families

FamilyIdeaTradeoff
Affine coupling (NICE, RealNVP, Glow)Split in half; one half passes through, the other is affinely transformed by a function of the firstTriangular Jacobian → is product of diagonal; needs many layers for expressiveness
Autoregressive (MAF, IAF)Each output dimension is an affine function of preceding ones; Jacobian is triangularMAF: fast density, slow sample. IAF: fast sample, slow density.
Continuous-time / Neural ODE (FFJORD)Define via and integrate; Jacobian via Hutchinson trace estimatorVery expressive; expensive integration
Invertible 1×1 convolutions (Glow)Permutation generalization for image flowsUsed inside Glow for permutation between coupling layers

RealNVP / coupling layers (the workhorse)

Split . Then:

with neural nets . The Jacobian is lower triangular with on the diagonal of the block. Determinant: .

Stack many coupling layers, alternating which half passes through, with shuffles or 1×1 convs between them.

When to use flows in 2026

SettingFlows vs. alternatives
High-fidelity image generationUse diffusion; flows are non-competitive
Density estimation, OOD detectionFlows give exact likelihood
Simulation-based inference (likelihood-free)Flows excellent (NPE, NRE)
Molecular conformation / coordinatesFlows used (E-NF, equivariant flows)
Probabilistic forecastingFlows + RNN backbones (Real-NVP-style)
Variational inference posterior approxFlows as flexible

Common pitfalls

  • Computing log-determinants without exploiting structure. General log-det is . Always use a flow with a structured Jacobian (triangular, low-rank).
  • Confusing log with log-likelihood directly. The full formula has both the base density term and the determinant term.
  • Treating flows as fast-to-train. They are usually slower per epoch than VAE / diffusion at matched parameter count due to expensive Jacobian computations.
  • Using flows on discrete data. Flows assume continuous, differentiable spaces. For discrete: dequantize (add uniform noise), or use discrete normalizing flows (more complex).