Definition and Problem Statement
Overdetermined Systems
Problem: System Ax = b with A ∈ ℝ^{m×n}, m > n, typically no exact solution. Goal: Find x minimizing residual r = b - Ax.
Least Squares Criterion
Minimization: min_x ‖r‖_2^2 = min_x (b - Ax)ᵀ(b - Ax). Norm: Euclidean (ℓ_2) norm standard.
Mathematical Formulation
Objective: Solve min_x ∥b - Ax∥². Equivalently: Solve for x that best fits b in range of A.
Given: A ∈ ℝ^{m×n}, b ∈ ℝ^mFind: x ∈ ℝ^n to minimize f(x) = ∥b - Ax∥²_2 Geometric Interpretation
Projection onto Column Space
Residual r orthogonal to Range(A): r ⟂ Col(A). Least squares solution projects b onto Col(A).
Orthogonality Condition
Condition: Aᵀr = 0 or Aᵀ(b - Ax) = 0 ensures minimal distance from b to Col(A).
Visualization in ℝ³
Example: Plane defined by Col(A) in ℝ³, b outside plane, projection is closest vector in plane.
Normal Equations
Derivation
Set gradient of f(x) to zero: ∇f = -2Aᵀ(b - Ax) = 0. Leads to AᵀAx = Aᵀb.
Solution Criteria
Matrix AᵀA symmetric positive semidefinite. If full rank, invertible, unique solution x = (AᵀA)⁻¹Aᵀb.
Computational Considerations
Direct inversion may be unstable or expensive for large systems; alternatives preferred.
Normal Equations:Aᵀ A x = Aᵀ bIf rank(A) = n → x = (Aᵀ A)⁻¹ Aᵀ b Least Squares in Inner Product Spaces
General Inner Product
Inner product ⟨u,v⟩ defines norm ∥v∥ = √⟨v,v⟩. Least squares applies to Hilbert spaces.
Projection Theorem
Existence and uniqueness of projections onto closed subspaces. Best approximation characterized by orthogonality.
Weighted Least Squares
Weighted norm: ∥x∥_W = √(xᵀWx), W positive definite. Applies to heteroscedastic data or prior knowledge.
Weighted least squares:min_x (b - Ax)ᵀ W (b - Ax)Normal equations: Aᵀ W A x = Aᵀ W b Matrix Decompositions for Least Squares
QR Decomposition
A = QR, Q orthogonal (QᵀQ=I), R upper-triangular. Solve Rx = Qᵀb via back substitution.
SVD (Singular Value Decomposition)
A = UΣVᵀ, handles rank-deficient A, numerical stability, pseudoinverse computation.
Cholesky Decomposition
Decompose AᵀA = LLᵀ, efficient for symmetric positive definite, used in normal equations solution.
| Decomposition | Properties | Use Case |
|---|---|---|
| QR | Orthogonal Q, upper-triangular R | Stable least squares, full rank |
| SVD | Orthogonal U,V, diagonal Σ | Rank-deficient, pseudoinverse |
| Cholesky | LLᵀ factorization of AᵀA | Normal equations, positive definite |
Computational Algorithms
Direct Methods
Use QR or SVD decompositions, avoid explicit normal equations for accuracy and stability.
Iterative Methods
Gradient descent, conjugate gradient for large-scale or sparse problems.
Algorithmic Complexity
QR: O(m n²), SVD: O(m n² + n³), iterative depends on convergence and sparsity.
Algorithmic outline (QR):1. Compute A = Q R2. Compute y = Qᵀ b3. Solve R x = y by back substitution Error Analysis and Stability
Condition Number
Condition number κ(A) = ∥A∥ ∥A⁻¹∥ quantifies sensitivity of solution to perturbations.
Effect of Noise
Measurement noise in b amplifies errors in x, especially if A nearly rank-deficient.
Regularization Techniques
Tikhonov regularization (ridge regression) adds λI to AᵀA for stability: (AᵀA + λI)x = Aᵀb.
Tikhonov Regularization:min_x ∥b - Ax∥² + λ ∥x∥²Normal equations: (Aᵀ A + λ I) x = Aᵀ b Applications
Data Fitting and Regression
Linear regression models, parameter estimation in statistics and machine learning.
Signal Processing
Noise reduction, filter design, system identification via least squares optimization.
Computer Graphics
Curve fitting, surface approximation, pose estimation in 3D reconstruction.
Extensions and Generalizations
Nonlinear Least Squares
Iterative linearization of nonlinear functions for parameter estimation.
Total Least Squares
Accounts for errors in both A and b, minimizing orthogonal distances.
Robust Regression
Minimizes alternative norms (e.g., ℓ_1) to reduce influence of outliers.
Examples and Illustrations
Simple 2D Linear Fit
Fit line y = mx + c to points minimizing squared vertical deviations.
Matrix Form Example
A = [[1,1],[1,2],[1,3]], b = [1,2,2]. Compute x minimizing ∥b - Ax∥².
A = |1 1| |1 2| |1 3| , b = |1| |2| |2|Normal equations:Aᵀ A = |3 6| |6 14|Aᵀ b = |5| |10|Solve:|3 6| x = |5||6 14| |10|x = (Aᵀ A)^(-1) Aᵀ b Summary Tables
| Concept | Formula/Definition | Remarks |
|---|---|---|
| Residual Vector | r = b - Ax | Difference between data and model |
| Least Squares Solution | x = (Aᵀ A)⁻¹ Aᵀ b | If A has full column rank |
| Normal Equations | Aᵀ A x = Aᵀ b | Condition for minimization |
| Weighted Least Squares | min ∥W^{1/2}(b - Ax)∥² | Weights W positive definite |
| Regularized Solution | x = (Aᵀ A + λ I)⁻¹ Aᵀ b | Improves stability, reduces overfitting |
References
- Golub, G. H., & Van Loan, C. F. "Matrix Computations." Johns Hopkins University Press, vol. 3, 1996, pp. 105-120.
- Strang, G. "Linear Algebra and Its Applications." Thomson Brooks/Cole, 4th ed., 2006, pp. 150-185.
- Bjӧrck, Å. "Numerical Methods for Least Squares Problems." SIAM, 1996, pp. 34-67.
- Hansen, P. C. "Rank-Deficient and Discrete Ill-Posed Problems." SIAM, 1998, pp. 10-40.
- Seber, G. A. F., & Lee, A. J. "Linear Regression Analysis." Wiley, 2nd ed., 2003, pp. 45-90.