One-line definition
Every real matrix admits the factorization where and are orthogonal and is diagonal with non-negative entries (singular values). PCA is SVD applied to a mean-centered data matrix.
Why it matters
SVD is the universal matrix factorization. It exists for every matrix, even rectangular and rank-deficient ones. Reading off properties from the SVD answers “what does this matrix do?”: singular values give scaling factors, gives input directions, gives output directions.
PCA is the canonical use of SVD: project data onto the directions of largest variance to get a low-dimensional representation that preserves as much information as possible.
The decomposition
:
- ‘s columns are an orthonormal basis of the row space of (input directions).
- ’s columns are an orthonormal basis of the column space (output directions).
- ’s diagonal entries are the singular values (how much each input direction is stretched into its output direction).
Geometrically: any linear map is “rotate the input, stretch axis-by-axis, rotate the output.” That’s it.
The rank of is the number of non-zero singular values. The condition number is where is the rank.
Truncated SVD and low-rank approximation
The best rank- approximation of in Frobenius (or spectral) norm is
where keep the first columns and keeps the first singular values (Eckart–Young theorem). Used in: dimensionality reduction, image compression, embedding regularization, low-rank LoRA fine-tuning.
PCA as SVD
Given a data matrix ( samples, features):
- Mean-center: .
- Compute SVD: .
- The columns of are the principal components (directions of maximum variance in feature space).
- The variance along the -th component is .
- Project to dimensions: .
Equivalent formulation: PCA = eigendecomposition of the sample covariance . SVD is numerically more stable.
Common pitfalls
- Forgetting to center. PCA on uncentered data finds the direction toward the mean as PC1, which is rarely what you want.
- Forgetting to scale. If features have different units, large-magnitude features dominate; standardize (divide by std) before PCA when units differ.
- Confusing PCA with whitening. PCA gives uncorrelated components but not unit variance. Whitening = PCA + scale to unit variance.
- Using PCA on categorical / sparse data without thought. PCA assumes Euclidean structure; for sparse / categorical data, look at NMF, LDA, or contrastive embeddings.
Related
- Matrices as linear maps. The geometry.
- Eigenvalues and the spectral theorem. For symmetric matrices.