One-line definition
A probabilistic graphical model (PGM) is a representation of a joint probability distribution as a graph whose nodes are random variables and whose edges encode dependencies. The graph structure determines a factorization of the joint and a set of conditional independence relations.
Why it matters
PGMs were the dominant framework for probabilistic ML from the 1990s through the early 2010s. Many modern probabilistic methods. VAEs, latent-variable diffusion, message passing in transformers (loosely). Descend from PGM ideas. Knowing PGMs gives you the right conceptual vocabulary for any latent-variable model: independence, factorization, marginalization, conditioning.
Two main families
Bayesian networks (directed acyclic graphs)
Each node has a conditional distribution given its parents. The joint factorizes as
Examples: naive Bayes (one parent class node, leaf observation nodes), HMM, Bayesian linear regression, hierarchical Bayesian models.
Encoded independence: each node is conditionally independent of its non-descendants given its parents (local Markov property).
Markov random fields (undirected graphs)
The joint factorizes over cliques :
with potential functions and partition function .
Examples: image MRFs (pairwise potentials between neighboring pixels), CRFs (conditional random fields, discriminative MRFs), Boltzmann machines.
Encoded independence: if separates from in the graph.
d-separation (Bayesian networks)
A path between two nodes is blocked by a set if either:
- A non-collider on the path is in , or
- A collider (node with two incoming arrows on the path) and none of its descendants are in .
Two nodes are d-separated by if every path between them is blocked. d-separation conditional independence given (in the model).
This formalism explains the famous explaining-away phenomenon: conditioning on a common effect makes its causes correlated.
Inference tasks
For a graphical model, the standard tasks are:
- Marginal: compute or for a subset .
- Conditional: compute given observations.
- MAP: find .
Exact methods:
- Variable elimination: marginalize out variables one by one, exploiting factorization.
- Belief propagation / sum-product: message passing on tree-structured graphs (or cluster graphs / junction trees for general graphs).
- Junction tree algorithm: exact inference in any graph by clustering into a tree of cliques.
For graphs with high tree-width, exact inference is exponential. Approximate methods:
- MCMC (Gibbs, Metropolis-Hastings).
- Variational inference (mean-field, structured, neural).
- Expectation propagation.
Special cases that became their own fields
| Graphical model | Modern name |
|---|---|
| Latent variable Bayesian network | VAE (with neural conditional distributions) |
| Linear-Gaussian state space | Kalman filter |
| Discrete latent chain | HMM |
| Conditional MRF | CRF |
| Boltzmann machine | RBM, deep belief net (historical) |
| Topic model (Bayesian doc-topic) | LDA |
| Naive Bayes | Naive Bayes (still used) |
Relevance in 2026
PGM as a framework is less central than it was in 2010, replaced by neural networks for most practical inference. But graphical-model thinking persists in:
- Diffusion models (Markov chain over noise levels).
- VAEs (latent → observation Bayesian network).
- Probabilistic programming (Pyro, Stan, NumPyro).
- Causal inference (DAGs are the language).
- Structured prediction with CRFs in some NLP pipelines.
Common pitfalls
- Confusing causation with d-separation. PGMs model dependencies; causation requires additional assumptions (intervention, do-calculus).
- Treating the joint distribution as fully specified by the graph alone. The graph only specifies structure; the conditional distributions are separate.
- Forgetting that exact inference is intractable for general MRFs. Tree-width matters.
- Reading missing edges as independence. They imply conditional independence given the rest, not marginal independence.
Related
- Markov chains. Simplest sequential PGM.
- Bayes’ rule and the posterior. Foundation for PGM inference.