SVD and PCA

One-line definition

Every real matrix $A \in R^{m \times n}$ admits the factorization $A = U Σ V^{⊤}$ where $U \in R^{m \times m}$ and $V \in R^{n \times n}$ are orthogonal and $Σ$ is diagonal with non-negative entries (singular values). PCA is SVD applied to a mean-centered data matrix.

Why it matters

SVD is the universal matrix factorization. It exists for every matrix, even rectangular and rank-deficient ones. Reading off properties from the SVD answers “what does this matrix do?”: singular values give scaling factors, $V$ gives input directions, $U$ gives output directions.

PCA is the canonical use of SVD: project data onto the directions of largest variance to get a low-dimensional representation that preserves as much information as possible.

The decomposition

$A = U Σ V^{⊤}$ :

$V$ ‘s columns are an orthonormal basis of the row space of $A$ (input directions).
$U$ ’s columns are an orthonormal basis of the column space (output directions).
$Σ$ ’s diagonal entries $σ_{1} \geq σ_{2} \geq \dots \geq 0$ are the singular values (how much each input direction is stretched into its output direction).

Geometrically: any linear map is “rotate the input, stretch axis-by-axis, rotate the output.” That’s it.

The rank of $A$ is the number of non-zero singular values. The condition number is $σ_{1} / σ_{r}$ where $r$ is the rank.

Truncated SVD and low-rank approximation

The best rank- $k$ approximation of $A$ in Frobenius (or spectral) norm is

A_{k} = U_{k} Σ_{k} V_{k}^{⊤}

where $U_{k}, V_{k}$ keep the first $k$ columns and $Σ_{k}$ keeps the first $k$ singular values (Eckart–Young theorem). Used in: dimensionality reduction, image compression, embedding regularization, low-rank LoRA fine-tuning.

PCA as SVD

Given a data matrix $X \in R^{n \times d}$ ( $n$ samples, $d$ features):

Mean-center: $\tilde{X} = X - \overset{x}{ˉ}$ .
Compute SVD: $\tilde{X} = U Σ V^{⊤}$ .
The columns of $V$ are the principal components (directions of maximum variance in feature space).
The variance along the $i$ -th component is $σ_{i}^{2} / (n - 1)$ .
Project to $k$ dimensions: $Z = \tilde{X} V_{k} = U_{k} Σ_{k}$ .

Equivalent formulation: PCA = eigendecomposition of the sample covariance $\tilde{X}^{⊤} \tilde{X} / (n - 1)$ . SVD is numerically more stable.

Common pitfalls

Forgetting to center. PCA on uncentered data finds the direction toward the mean as PC1, which is rarely what you want.
Forgetting to scale. If features have different units, large-magnitude features dominate; standardize (divide by std) before PCA when units differ.
Confusing PCA with whitening. PCA gives uncorrelated components but not unit variance. Whitening = PCA + scale to unit variance.
Using PCA on categorical / sparse data without thought. PCA assumes Euclidean structure; for sparse / categorical data, look at NMF, LDA, or contrastive embeddings.

Matrices as linear maps. The geometry.
Eigenvalues and the spectral theorem. For symmetric matrices.