Least Squares | What's Your IQ

Definition and Problem Statement

Overdetermined Systems

Problem: System Ax = b with A ∈ ℝ^{m×n}, m > n, typically no exact solution. Goal: Find x minimizing residual r = b - Ax.

Least Squares Criterion

Minimization: min_x ‖r‖_2^2 = min_x (b - Ax)ᵀ(b - Ax). Norm: Euclidean (ℓ_2) norm standard.

Mathematical Formulation

Objective: Solve min_x ∥b - Ax∥². Equivalently: Solve for x that best fits b in range of A.

Given: A ∈ ℝ^{m×n}, b ∈ ℝ^mFind: x ∈ ℝ^n to minimize f(x) = ∥b - Ax∥²_2

Geometric Interpretation

Projection onto Column Space

Residual r orthogonal to Range(A): r ⟂ Col(A). Least squares solution projects b onto Col(A).

Orthogonality Condition

Condition: Aᵀr = 0 or Aᵀ(b - Ax) = 0 ensures minimal distance from b to Col(A).

Visualization in ℝ³

Example: Plane defined by Col(A) in ℝ³, b outside plane, projection is closest vector in plane.

Normal Equations

Derivation

Set gradient of f(x) to zero: ∇f = -2Aᵀ(b - Ax) = 0. Leads to AᵀAx = Aᵀb.

Solution Criteria

Matrix AᵀA symmetric positive semidefinite. If full rank, invertible, unique solution x = (AᵀA)⁻¹Aᵀb.

Computational Considerations

Direct inversion may be unstable or expensive for large systems; alternatives preferred.

Normal Equations:Aᵀ A x = Aᵀ bIf rank(A) = n → x = (Aᵀ A)⁻¹ Aᵀ b

Least Squares in Inner Product Spaces

General Inner Product

Inner product ⟨u,v⟩ defines norm ∥v∥ = √⟨v,v⟩. Least squares applies to Hilbert spaces.

Projection Theorem

Existence and uniqueness of projections onto closed subspaces. Best approximation characterized by orthogonality.

Weighted Least Squares

Weighted norm: ∥x∥_W = √(xᵀWx), W positive definite. Applies to heteroscedastic data or prior knowledge.

Weighted least squares:min_x (b - Ax)ᵀ W (b - Ax)Normal equations: Aᵀ W A x = Aᵀ W b

Matrix Decompositions for Least Squares

QR Decomposition

A = QR, Q orthogonal (QᵀQ=I), R upper-triangular. Solve Rx = Qᵀb via back substitution.

SVD (Singular Value Decomposition)

A = UΣVᵀ, handles rank-deficient A, numerical stability, pseudoinverse computation.

Cholesky Decomposition

Decompose AᵀA = LLᵀ, efficient for symmetric positive definite, used in normal equations solution.

Decomposition	Properties	Use Case
QR	Orthogonal Q, upper-triangular R	Stable least squares, full rank
SVD	Orthogonal U,V, diagonal Σ	Rank-deficient, pseudoinverse
Cholesky	LLᵀ factorization of AᵀA	Normal equations, positive definite

Computational Algorithms

Direct Methods

Use QR or SVD decompositions, avoid explicit normal equations for accuracy and stability.

Iterative Methods

Gradient descent, conjugate gradient for large-scale or sparse problems.

Algorithmic Complexity

QR: O(m n²), SVD: O(m n² + n³), iterative depends on convergence and sparsity.

Algorithmic outline (QR):1. Compute A = Q R2. Compute y = Qᵀ b3. Solve R x = y by back substitution

Error Analysis and Stability

Condition Number

Condition number κ(A) = ∥A∥ ∥A⁻¹∥ quantifies sensitivity of solution to perturbations.

Effect of Noise

Measurement noise in b amplifies errors in x, especially if A nearly rank-deficient.

Regularization Techniques

Tikhonov regularization (ridge regression) adds λI to AᵀA for stability: (AᵀA + λI)x = Aᵀb.

Tikhonov Regularization:min_x ∥b - Ax∥² + λ ∥x∥²Normal equations: (Aᵀ A + λ I) x = Aᵀ b

Applications

Data Fitting and Regression

Linear regression models, parameter estimation in statistics and machine learning.

Signal Processing

Noise reduction, filter design, system identification via least squares optimization.

Computer Graphics

Curve fitting, surface approximation, pose estimation in 3D reconstruction.

Extensions and Generalizations

Nonlinear Least Squares

Iterative linearization of nonlinear functions for parameter estimation.

Total Least Squares

Accounts for errors in both A and b, minimizing orthogonal distances.

Robust Regression

Minimizes alternative norms (e.g., ℓ_1) to reduce influence of outliers.

Examples and Illustrations

Simple 2D Linear Fit

Fit line y = mx + c to points minimizing squared vertical deviations.

Matrix Form Example

A = [[1,1],[1,2],[1,3]], b = [1,2,2]. Compute x minimizing ∥b - Ax∥².

A = |1 1| |1 2| |1 3| , b = |1| |2| |2|Normal equations:Aᵀ A = |3 6| |6 14|Aᵀ b = |5| |10|Solve:|3 6| x = |5||6 14| |10|x = (Aᵀ A)^(-1) Aᵀ b

Summary Tables

Concept	Formula/Definition	Remarks
Residual Vector	r = b - Ax	Difference between data and model
Least Squares Solution	x = (Aᵀ A)⁻¹ Aᵀ b	If A has full column rank
Normal Equations	Aᵀ A x = Aᵀ b	Condition for minimization
Weighted Least Squares	min ∥W^{1/2}(b - Ax)∥²	Weights W positive definite
Regularized Solution	x = (Aᵀ A + λ I)⁻¹ Aᵀ b	Improves stability, reduces overfitting

References

Golub, G. H., & Van Loan, C. F. "Matrix Computations." Johns Hopkins University Press, vol. 3, 1996, pp. 105-120.
Strang, G. "Linear Algebra and Its Applications." Thomson Brooks/Cole, 4th ed., 2006, pp. 150-185.
Bjӧrck, Å. "Numerical Methods for Least Squares Problems." SIAM, 1996, pp. 34-67.
Hansen, P. C. "Rank-Deficient and Discrete Ill-Posed Problems." SIAM, 1998, pp. 10-40.
Seber, G. A. F., & Lee, A. J. "Linear Regression Analysis." Wiley, 2nd ed., 2003, pp. 45-90.