One-line definition
Factor analysis (FA) is a latent linear-Gaussian model: each observation is a linear map of a few low-dimensional latent factors plus Gaussian noise. Probabilistic PCA (PPCA) is the special case with isotropic noise, and classical PCA falls out as its zero-noise / maximum-likelihood limit.
Why it matters
This is the model that turns PCA from “an eigen-decomposition trick” into “a probabilistic generative model,” which is the framing senior interviewers want. It connects dimensionality reduction to the EM algorithm, to VAEs (a nonlinear PPCA), and to the generative-vs-discriminative discussion. It’s also a clean example of how a prior + likelihood recovers a classical algorithm as a limiting case.
The generative model
Latent factor with , observation :
is the factor loading matrix (the directions), and is the noise covariance. Marginalizing gives a Gaussian with low-rank-plus-structured covariance:
The whole model is the claim: the correlations between observed variables are explained by a few shared latent factors; whatever is left is independent per-feature noise.
FA vs PPCA vs PCA — it’s all about
| Model | Noise covariance | Consequence |
|---|---|---|
| Factor analysis | diagonal | per-feature noise; scale-invariant; models unique variances |
| Probabilistic PCA | isotropic | one shared noise level; MLE has closed form via eigendecomposition |
| Classical PCA | limit | deterministic projection onto top- eigenvectors |
The single most important distinction for interviews: FA has a diagonal noise covariance (different noise per feature); PPCA forces it isotropic (same noise everywhere). That’s why FA is invariant to rescaling individual features while PCA/PPCA is sensitive to feature scaling (hence “standardize before PCA”).
Fitting it
- PPCA has a closed-form MLE: is recovered from the top- eigenvectors of the sample covariance scaled by , with = average of the discarded eigenvalues. So PPCA ≈ PCA plus a noise estimate.
- FA has no closed form (the diagonal couples things); it’s fit with EM: the E-step infers the posterior over factors , the M-step updates and . This is a textbook EM application.
Why the probabilistic version is worth it
Recasting PCA as a model buys you things plain PCA can’t do:
- A proper likelihood → principled model comparison and a way to choose .
- Natural handling of missing data (marginalize unobserved dimensions in EM).
- A generative model you can sample from.
- Mixtures of PPCA/FA for non-linear, multi-modal structure.
- The conceptual bridge to the VAE, which is “PPCA with a neural-network decoder and amortized inference.”
What an interviewer expects you to say
- Write the latent linear-Gaussian generative model and the marginal covariance .
- State the key difference: FA = diagonal noise, PPCA = isotropic noise, PCA = zero-noise limit of PPCA.
- Explain the practical consequence: FA is scale-invariant; PCA/PPCA require feature standardization.
- Know that PPCA has a closed-form (eigendecomposition) MLE while FA needs EM.
- Bonus: connect to VAEs (nonlinear PPCA) and note the probabilistic framing enables missing data, model selection, and sampling.
Common confusions
- “FA and PCA are the same.” FA models per-feature (diagonal) noise and explains covariance; PCA maximizes retained variance and assumes isotropic/zero noise. They give different loadings unless noise is uniform.
- “PPCA is fancier PCA with no payoff.” The payoff is the likelihood: model selection, missing data, sampling, mixtures.
- “The factors are unique.” is only identifiable up to rotation (you can rotate and absorb it into ) — hence “factor rotation” (varimax) for interpretability.
- “FA needs scaling like PCA.” FA is invariant to per-feature rescaling because its diagonal noise absorbs scale; PCA is not.
Related: SVD and PCA, Expectation-maximization, Gaussian mixture models, Variational autoencoders.