One-line definition
An eigenvector of is a non-zero vector such that for some scalar (the eigenvalue). acts on purely by scaling. The spectral theorem states that every real symmetric matrix is orthogonally diagonalizable: with orthogonal and diagonal real.
Why it matters
Eigendecompositions explain stability of dynamical systems, convergence of optimization, structure of covariance matrices, and properties of attention / graph operators. The spectral theorem is the mathematical reason PCA works on covariance matrices, why Laplacian eigenmaps make sense for graphs, and why second-order optimizers reason about Hessian eigenvalues.
Eigenvalues, eigenvectors, characteristic polynomial
For a square matrix :
- is equivalent to , so eigenvalues are roots of (the characteristic polynomial).
- An matrix has eigenvalues (counted with multiplicity), possibly complex, possibly repeated.
- Trace . Determinant .
For symmetric matrices, all eigenvalues are real and there exists an orthonormal eigenbasis.
The spectral theorem (symmetric case)
If :
where is orthogonal () and .
Geometric meaning: in the eigenbasis , the action of is independent scaling along each axis. Symmetric matrices have no rotational component. They are pure stretches in some orthogonal frame.
Connection to SVD
For symmetric positive semi-definite : SVD and eigendecomposition coincide (, ). For general matrices they differ. SVD is the more general tool; eigendecomposition is the specialized one for symmetric / square matrices.
Where eigenvalues show up in ML
| Object | What its eigenvalues tell you |
|---|---|
| Covariance matrix | Variances along principal axes (PCA) |
| Hessian of loss | Local curvature; condition number = \lambda_\max / \lambda_\min |
| Graph Laplacian | Connectivity, spectral clustering, GNN smoothness |
| Markov transition matrix | Mixing rate (second-largest eigenvalue) |
| Attention | Effective rank; low-rank structure |
| Recurrent weight matrix | Whether RNN gradients explode/vanish |
Common pitfalls
- Treating asymmetric matrices like symmetric ones. Asymmetric matrices may have complex eigenvalues and may not be diagonalizable at all (Jordan form).
- Computing eigendecomposition for huge matrices. Use Lanczos / Arnoldi or randomized SVD for large-scale; full eigendecomposition is .
- Confusing eigenvalues with singular values. Equal only for symmetric PSD matrices; otherwise singular values are .
Related
- SVD and PCA. Generalization to all matrices.
- Positive definite matrices. The cone of PSD matrices.