Skip to content
mentorship

concepts

SVD and PCA

The singular value decomposition factorizes any matrix into rotation × stretching × rotation. PCA is SVD applied to mean-centered data.

Reviewed · 2 min read

One-line definition

Every real matrix admits the factorization where and are orthogonal and is diagonal with non-negative entries (singular values). PCA is SVD applied to a mean-centered data matrix.

Why it matters

SVD is the universal matrix factorization. It exists for every matrix, even rectangular and rank-deficient ones. Reading off properties from the SVD answers “what does this matrix do?”: singular values give scaling factors, gives input directions, gives output directions.

PCA is the canonical use of SVD: project data onto the directions of largest variance to get a low-dimensional representation that preserves as much information as possible.

The decomposition

:

  • ‘s columns are an orthonormal basis of the row space of (input directions).
  • ’s columns are an orthonormal basis of the column space (output directions).
  • ’s diagonal entries are the singular values (how much each input direction is stretched into its output direction).

Geometrically: any linear map is “rotate the input, stretch axis-by-axis, rotate the output.” That’s it.

The rank of is the number of non-zero singular values. The condition number is where is the rank.

Truncated SVD and low-rank approximation

The best rank- approximation of in Frobenius (or spectral) norm is

where keep the first columns and keeps the first singular values (Eckart–Young theorem). Used in: dimensionality reduction, image compression, embedding regularization, low-rank LoRA fine-tuning.

PCA as SVD

Given a data matrix ( samples, features):

  1. Mean-center: .
  2. Compute SVD: .
  3. The columns of are the principal components (directions of maximum variance in feature space).
  4. The variance along the -th component is .
  5. Project to dimensions: .

Equivalent formulation: PCA = eigendecomposition of the sample covariance . SVD is numerically more stable.

Common pitfalls

  • Forgetting to center. PCA on uncentered data finds the direction toward the mean as PC1, which is rarely what you want.
  • Forgetting to scale. If features have different units, large-magnitude features dominate; standardize (divide by std) before PCA when units differ.
  • Confusing PCA with whitening. PCA gives uncorrelated components but not unit variance. Whitening = PCA + scale to unit variance.
  • Using PCA on categorical / sparse data without thought. PCA assumes Euclidean structure; for sparse / categorical data, look at NMF, LDA, or contrastive embeddings.