Hidden Markov models

A latent Markov chain emits observations through a per-state distribution. Forward-backward, Viterbi, Baum-Welch. The classical sequence model toolkit.

Reviewed April 16, 2026 · 3 min read

One-line definition

A Hidden Markov Model is a latent-variable sequence model with: (a) a discrete latent state $z_{t}$ evolving as a first-order Markov chain with transition matrix $A$ , and (b) per-state emission distributions producing observations $x_{t} ∣ z_{t}$ .

Why it matters

HMMs were the dominant sequence model from the 1970s through the early 2010s for speech recognition, part-of-speech tagging, gene finding, and many time-series problems. They have largely been displaced by neural sequence models (RNNs, transformers) for tasks with abundant data, but remain useful for:

Small-data sequence labeling.
Settings with strong domain structure (gene finding still uses HMMs).
Online filtering with limited compute.
As a teaching example of latent-variable inference.

The three classical HMM problems. Likelihood, decoding, learning. And their solutions (forward, Viterbi, Baum-Welch) are core probabilistic ML.

The model

Discrete latent states $z_{t} \in {1, \dots, K}$ .
Initial distribution $π_{k} = p (z_{1} = k)$ .
Transition matrix $A_{ij} = p (z_{t + 1} = j ∣ z_{t} = i)$ .
Emission distributions $p (x_{t} ∣ z_{t} = k; ϕ_{k})$ . Typically Gaussian or categorical.

The joint:

p (x_{1 : T}, z_{1 : T}) = π_{z_{1}} p (x_{1} ∣ z_{1}) t = 2 \prod T A_{z_{t - 1}, z_{t}} p (x_{t} ∣ z_{t}) .

The three classical problems

1. Likelihood: forward algorithm

Compute $p (x_{1 : T})$ by marginalizing over $z_{1 : T}$ . Naive sum is $O (K^{T})$ . The forward algorithm uses dynamic programming:

α_{t} (k) = p (x_{1 : t}, z_{t} = k) = p (x_{t} ∣ z_{t} = k) j \sum A_{j, k} α_{t - 1} (j) .

Complexity: $O (T K^{2})$ . Final likelihood: $\sum_{k} α_{T} (k)$ .

2. Decoding: Viterbi algorithm

Find the most likely sequence $z_{1 : T}^{*}$ . Same DP structure but replace sum with max:

δ_{t} (k) = j max A_{j, k} δ_{t - 1} (j) \cdot p (x_{t} ∣ z_{t} = k) .

Backtrack from $ar g max_{k} δ_{T} (k)$ . Complexity: $O (T K^{2})$ .

3. Learning: Baum-Welch (EM)

Estimate $π, A, ϕ$ from observations alone. E-step: compute posterior over latents using forward-backward. M-step: weighted MLE on transitions and emissions. This is EM applied to HMMs; converges to local optimum of the log-likelihood.

Forward-backward

The forward variable $α_{t} (k) = p (x_{1 : t}, z_{t} = k)$ and backward variable $β_{t} (k) = p (x_{t + 1 : T} ∣ z_{t} = k)$ together give:

Posterior over single state: $p (z_{t} = k ∣ x_{1 : T}) = α_{t} (k) β_{t} (k) / p (x_{1 : T})$ .
Posterior over consecutive pair: needed for the EM transition update.

Forward-backward is the HMM analog of message passing on a chain. Exact in $O (T K^{2})$ .

Connection to other models

Model	Relation to HMM
Mixture of Gaussians	HMM with $T = 1$
Linear-Gaussian state space (Kalman filter)	Continuous-state HMM
CRF	Discriminative HMM (model $p (z ∣ x)$ directly)
Linear-chain RNN	Neural generalization with continuous latents
Transformer	Replaces Markov assumption with attention over full sequence

When to use HMMs in 2026

Setting	HMM vs. alternatives
Phoneme alignment in TTS / ASR forced alignment	HMM still standard
Bioinformatics (gene finding, profile HMMs)	HMMs dominant
Small-data sequence labeling	HMM or CRF baseline
Modern NLP (NER, POS)	Transformers win
Speech recognition (end-to-end)	RNN-T or transformer encoder + CTC

Common pitfalls

Numerical underflow. $α_{t} (k)$ shrinks geometrically; use log-space or scaling.
EM local optima. Multiple random restarts; initialize emission means with k-means.
Treating HMMs as state-of-the-art for general sequence tasks. They are not for any task with abundant data.
Confusing first-order Markov with the model’s expressiveness. The latent is first-order Markov; the observations can have arbitrary long-range structure mediated by latents (which is why HMMs work at all).

Markov chains. The latent dynamics.
Expectation-Maximization. Baum-Welch is EM for HMMs.