Skip to content
mentorship

concepts

Probabilistic graphical models

Express joint distributions as graphs whose structure encodes conditional independence. Bayesian networks (directed) and Markov random fields (undirected).

Reviewed · 3 min read

One-line definition

A probabilistic graphical model (PGM) is a representation of a joint probability distribution as a graph whose nodes are random variables and whose edges encode dependencies. The graph structure determines a factorization of the joint and a set of conditional independence relations.

Why it matters

PGMs were the dominant framework for probabilistic ML from the 1990s through the early 2010s. Many modern probabilistic methods. VAEs, latent-variable diffusion, message passing in transformers (loosely). Descend from PGM ideas. Knowing PGMs gives you the right conceptual vocabulary for any latent-variable model: independence, factorization, marginalization, conditioning.

Two main families

Bayesian networks (directed acyclic graphs)

Each node has a conditional distribution given its parents. The joint factorizes as

Examples: naive Bayes (one parent class node, leaf observation nodes), HMM, Bayesian linear regression, hierarchical Bayesian models.

Encoded independence: each node is conditionally independent of its non-descendants given its parents (local Markov property).

Markov random fields (undirected graphs)

The joint factorizes over cliques :

with potential functions and partition function .

Examples: image MRFs (pairwise potentials between neighboring pixels), CRFs (conditional random fields, discriminative MRFs), Boltzmann machines.

Encoded independence: if separates from in the graph.

d-separation (Bayesian networks)

A path between two nodes is blocked by a set if either:

  • A non-collider on the path is in , or
  • A collider (node with two incoming arrows on the path) and none of its descendants are in .

Two nodes are d-separated by if every path between them is blocked. d-separation conditional independence given (in the model).

This formalism explains the famous explaining-away phenomenon: conditioning on a common effect makes its causes correlated.

Inference tasks

For a graphical model, the standard tasks are:

  • Marginal: compute or for a subset .
  • Conditional: compute given observations.
  • MAP: find .

Exact methods:

  • Variable elimination: marginalize out variables one by one, exploiting factorization.
  • Belief propagation / sum-product: message passing on tree-structured graphs (or cluster graphs / junction trees for general graphs).
  • Junction tree algorithm: exact inference in any graph by clustering into a tree of cliques.

For graphs with high tree-width, exact inference is exponential. Approximate methods:

  • MCMC (Gibbs, Metropolis-Hastings).
  • Variational inference (mean-field, structured, neural).
  • Expectation propagation.

Special cases that became their own fields

Graphical modelModern name
Latent variable Bayesian networkVAE (with neural conditional distributions)
Linear-Gaussian state spaceKalman filter
Discrete latent chainHMM
Conditional MRFCRF
Boltzmann machineRBM, deep belief net (historical)
Topic model (Bayesian doc-topic)LDA
Naive BayesNaive Bayes (still used)

Relevance in 2026

PGM as a framework is less central than it was in 2010, replaced by neural networks for most practical inference. But graphical-model thinking persists in:

  • Diffusion models (Markov chain over noise levels).
  • VAEs (latent → observation Bayesian network).
  • Probabilistic programming (Pyro, Stan, NumPyro).
  • Causal inference (DAGs are the language).
  • Structured prediction with CRFs in some NLP pipelines.

Common pitfalls

  • Confusing causation with d-separation. PGMs model dependencies; causation requires additional assumptions (intervention, do-calculus).
  • Treating the joint distribution as fully specified by the graph alone. The graph only specifies structure; the conditional distributions are separate.
  • Forgetting that exact inference is intractable for general MRFs. Tree-width matters.
  • Reading missing edges as independence. They imply conditional independence given the rest, not marginal independence.