Skip to content
mentorship

concepts

Hidden Markov models

A latent Markov chain emits observations through a per-state distribution. Forward-backward, Viterbi, Baum-Welch. The classical sequence model toolkit.

Reviewed · 3 min read

One-line definition

A Hidden Markov Model is a latent-variable sequence model with: (a) a discrete latent state evolving as a first-order Markov chain with transition matrix , and (b) per-state emission distributions producing observations .

Why it matters

HMMs were the dominant sequence model from the 1970s through the early 2010s for speech recognition, part-of-speech tagging, gene finding, and many time-series problems. They have largely been displaced by neural sequence models (RNNs, transformers) for tasks with abundant data, but remain useful for:

  • Small-data sequence labeling.
  • Settings with strong domain structure (gene finding still uses HMMs).
  • Online filtering with limited compute.
  • As a teaching example of latent-variable inference.

The three classical HMM problems. Likelihood, decoding, learning. And their solutions (forward, Viterbi, Baum-Welch) are core probabilistic ML.

The model

  • Discrete latent states .
  • Initial distribution .
  • Transition matrix .
  • Emission distributions . Typically Gaussian or categorical.

The joint:

The three classical problems

1. Likelihood: forward algorithm

Compute by marginalizing over . Naive sum is . The forward algorithm uses dynamic programming:

Complexity: . Final likelihood: .

2. Decoding: Viterbi algorithm

Find the most likely sequence . Same DP structure but replace sum with max:

Backtrack from . Complexity: .

3. Learning: Baum-Welch (EM)

Estimate from observations alone. E-step: compute posterior over latents using forward-backward. M-step: weighted MLE on transitions and emissions. This is EM applied to HMMs; converges to local optimum of the log-likelihood.

Forward-backward

The forward variable and backward variable together give:

  • Posterior over single state: .
  • Posterior over consecutive pair: needed for the EM transition update.

Forward-backward is the HMM analog of message passing on a chain. Exact in .

Connection to other models

ModelRelation to HMM
Mixture of GaussiansHMM with
Linear-Gaussian state space (Kalman filter)Continuous-state HMM
CRFDiscriminative HMM (model directly)
Linear-chain RNNNeural generalization with continuous latents
TransformerReplaces Markov assumption with attention over full sequence

When to use HMMs in 2026

SettingHMM vs. alternatives
Phoneme alignment in TTS / ASR forced alignmentHMM still standard
Bioinformatics (gene finding, profile HMMs)HMMs dominant
Small-data sequence labelingHMM or CRF baseline
Modern NLP (NER, POS)Transformers win
Speech recognition (end-to-end)RNN-T or transformer encoder + CTC

Common pitfalls

  • Numerical underflow. shrinks geometrically; use log-space or scaling.
  • EM local optima. Multiple random restarts; initialize emission means with k-means.
  • Treating HMMs as state-of-the-art for general sequence tasks. They are not for any task with abundant data.
  • Confusing first-order Markov with the model’s expressiveness. The latent is first-order Markov; the observations can have arbitrary long-range structure mediated by latents (which is why HMMs work at all).