Skip to content
mentorship

concepts

Eigenvalues and the spectral theorem

Eigenvectors are directions a matrix only stretches. The spectral theorem says symmetric matrices have a full orthogonal eigenbasis with real eigenvalues.

Reviewed · 2 min read

One-line definition

An eigenvector of is a non-zero vector such that for some scalar (the eigenvalue). acts on purely by scaling. The spectral theorem states that every real symmetric matrix is orthogonally diagonalizable: with orthogonal and diagonal real.

Why it matters

Eigendecompositions explain stability of dynamical systems, convergence of optimization, structure of covariance matrices, and properties of attention / graph operators. The spectral theorem is the mathematical reason PCA works on covariance matrices, why Laplacian eigenmaps make sense for graphs, and why second-order optimizers reason about Hessian eigenvalues.

Eigenvalues, eigenvectors, characteristic polynomial

For a square matrix :

  • is equivalent to , so eigenvalues are roots of (the characteristic polynomial).
  • An matrix has eigenvalues (counted with multiplicity), possibly complex, possibly repeated.
  • Trace . Determinant .

For symmetric matrices, all eigenvalues are real and there exists an orthonormal eigenbasis.

The spectral theorem (symmetric case)

If :

where is orthogonal () and .

Geometric meaning: in the eigenbasis , the action of is independent scaling along each axis. Symmetric matrices have no rotational component. They are pure stretches in some orthogonal frame.

Connection to SVD

For symmetric positive semi-definite : SVD and eigendecomposition coincide (, ). For general matrices they differ. SVD is the more general tool; eigendecomposition is the specialized one for symmetric / square matrices.

Where eigenvalues show up in ML

ObjectWhat its eigenvalues tell you
Covariance matrixVariances along principal axes (PCA)
Hessian of lossLocal curvature; condition number = \lambda_\max / \lambda_\min
Graph LaplacianConnectivity, spectral clustering, GNN smoothness
Markov transition matrixMixing rate (second-largest eigenvalue)
Attention Effective rank; low-rank structure
Recurrent weight matrixWhether RNN gradients explode/vanish

Common pitfalls

  • Treating asymmetric matrices like symmetric ones. Asymmetric matrices may have complex eigenvalues and may not be diagonalizable at all (Jordan form).
  • Computing eigendecomposition for huge matrices. Use Lanczos / Arnoldi or randomized SVD for large-scale; full eigendecomposition is .
  • Confusing eigenvalues with singular values. Equal only for symmetric PSD matrices; otherwise singular values are .