One-line definition
A Hidden Markov Model is a latent-variable sequence model with: (a) a discrete latent state evolving as a first-order Markov chain with transition matrix , and (b) per-state emission distributions producing observations .
Why it matters
HMMs were the dominant sequence model from the 1970s through the early 2010s for speech recognition, part-of-speech tagging, gene finding, and many time-series problems. They have largely been displaced by neural sequence models (RNNs, transformers) for tasks with abundant data, but remain useful for:
- Small-data sequence labeling.
- Settings with strong domain structure (gene finding still uses HMMs).
- Online filtering with limited compute.
- As a teaching example of latent-variable inference.
The three classical HMM problems. Likelihood, decoding, learning. And their solutions (forward, Viterbi, Baum-Welch) are core probabilistic ML.
The model
- Discrete latent states .
- Initial distribution .
- Transition matrix .
- Emission distributions . Typically Gaussian or categorical.
The joint:
The three classical problems
1. Likelihood: forward algorithm
Compute by marginalizing over . Naive sum is . The forward algorithm uses dynamic programming:
Complexity: . Final likelihood: .
2. Decoding: Viterbi algorithm
Find the most likely sequence . Same DP structure but replace sum with max:
Backtrack from . Complexity: .
3. Learning: Baum-Welch (EM)
Estimate from observations alone. E-step: compute posterior over latents using forward-backward. M-step: weighted MLE on transitions and emissions. This is EM applied to HMMs; converges to local optimum of the log-likelihood.
Forward-backward
The forward variable and backward variable together give:
- Posterior over single state: .
- Posterior over consecutive pair: needed for the EM transition update.
Forward-backward is the HMM analog of message passing on a chain. Exact in .
Connection to other models
| Model | Relation to HMM |
|---|---|
| Mixture of Gaussians | HMM with |
| Linear-Gaussian state space (Kalman filter) | Continuous-state HMM |
| CRF | Discriminative HMM (model directly) |
| Linear-chain RNN | Neural generalization with continuous latents |
| Transformer | Replaces Markov assumption with attention over full sequence |
When to use HMMs in 2026
| Setting | HMM vs. alternatives |
|---|---|
| Phoneme alignment in TTS / ASR forced alignment | HMM still standard |
| Bioinformatics (gene finding, profile HMMs) | HMMs dominant |
| Small-data sequence labeling | HMM or CRF baseline |
| Modern NLP (NER, POS) | Transformers win |
| Speech recognition (end-to-end) | RNN-T or transformer encoder + CTC |
Common pitfalls
- Numerical underflow. shrinks geometrically; use log-space or scaling.
- EM local optima. Multiple random restarts; initialize emission means with k-means.
- Treating HMMs as state-of-the-art for general sequence tasks. They are not for any task with abundant data.
- Confusing first-order Markov with the model’s expressiveness. The latent is first-order Markov; the observations can have arbitrary long-range structure mediated by latents (which is why HMMs work at all).
Related
- Markov chains. The latent dynamics.
- Expectation-Maximization. Baum-Welch is EM for HMMs.