Eigenvalues and the spectral theorem

One-line definition

An eigenvector of $A$ is a non-zero vector $v$ such that $A v = λ v$ for some scalar $λ$ (the eigenvalue). $A$ acts on $v$ purely by scaling. The spectral theorem states that every real symmetric matrix is orthogonally diagonalizable: $A = Q Λ Q^{⊤}$ with $Q$ orthogonal and $Λ$ diagonal real.

Why it matters

Eigendecompositions explain stability of dynamical systems, convergence of optimization, structure of covariance matrices, and properties of attention / graph operators. The spectral theorem is the mathematical reason PCA works on covariance matrices, why Laplacian eigenmaps make sense for graphs, and why second-order optimizers reason about Hessian eigenvalues.

Eigenvalues, eigenvectors, characteristic polynomial

For a square matrix $A \in R^{n \times n}$ :

$A v = λ v$ is equivalent to $(A - λ I) v = 0$ , so eigenvalues are roots of $det (A - λ I) = 0$ (the characteristic polynomial).
An $n \times n$ matrix has $n$ eigenvalues (counted with multiplicity), possibly complex, possibly repeated.
Trace $= \sum_{i} λ_{i}$ . Determinant $= \prod_{i} λ_{i}$ .

For symmetric matrices, all eigenvalues are real and there exists an orthonormal eigenbasis.

The spectral theorem (symmetric case)

If $A = A^{⊤} \in R^{n \times n}$ :

A = Q Λ Q^{⊤} = i = 1 \sum n λ_{i} q_{i} q_{i}^{⊤}

where $Q = [q_{1} ∣ \dots ∣ q_{n}]$ is orthogonal ( $Q^{⊤} Q = I$ ) and $Λ = diag (λ_{1}, \dots, λ_{n})$ .

Geometric meaning: in the eigenbasis ${q_{i}}$ , the action of $A$ is independent scaling along each axis. Symmetric matrices have no rotational component. They are pure stretches in some orthogonal frame.

Connection to SVD

For symmetric positive semi-definite $A$ : SVD and eigendecomposition coincide ( $U = V = Q$ , $Σ = Λ$ ). For general matrices they differ. SVD is the more general tool; eigendecomposition is the specialized one for symmetric / square matrices.

Where eigenvalues show up in ML

Object	What its eigenvalues tell you
Covariance matrix	Variances along principal axes (PCA)
Hessian of loss	Local curvature; condition number = $\lambda_\max / \lambda_\min$
Graph Laplacian	Connectivity, spectral clustering, GNN smoothness
Markov transition matrix	Mixing rate (second-largest eigenvalue)
Attention $Q K^{⊤}$	Effective rank; low-rank structure
Recurrent weight matrix	Whether RNN gradients explode/vanish

Common pitfalls

Treating asymmetric matrices like symmetric ones. Asymmetric matrices may have complex eigenvalues and may not be diagonalizable at all (Jordan form).
Computing eigendecomposition for huge matrices. Use Lanczos / Arnoldi or randomized SVD for large-scale; full eigendecomposition is $O (n^{3})$ .
Confusing eigenvalues with singular values. Equal only for symmetric PSD matrices; otherwise singular values are $λ_{i} (A^{⊤} A)$ .

SVD and PCA. Generalization to all matrices.
Positive definite matrices. The cone of PSD matrices.