Introduction

Principal Component Analysis (PCA): unsupervised learning method. Purpose: reduce high-dimensional data to lower dimensions. Key goals: preserve variance, remove redundancy, simplify data representation. Widely used in exploratory data analysis, visualization, noise reduction. Technique basis: linear algebra, statistics.

"PCA transforms data into a new coordinate system such that the greatest variance lies on the first coordinate." -- Karl Pearson, 1901

Mathematical Foundations

Vector Spaces and Linear Algebra

Data points: vectors in ℝⁿ. PCA: orthogonal linear transformation. Basis vectors rotated to align with directions of maximum variance. Goal: find orthonormal basis that best represents data variance.

Variance and Covariance

Variance: spread of data along a dimension. Covariance: measure of joint variability between two dimensions. PCA maximizes variance captured by each principal component, minimizes covariance (uncorrelated components).

Eigenvalues and Eigenvectors

Eigenvectors: directions of principal components. Eigenvalues: magnitude of variance explained. Computed from covariance matrix. Components ranked by eigenvalue magnitude.

Algorithm Description

Data Centering

Subtract mean vector from each data point. Ensures zero mean. Necessary for covariance calculation.

Covariance Matrix Computation

Calculate covariance matrix from centered data. Matrix size: n × n (n = number of features).

Eigen Decomposition

Compute eigenvalues and eigenvectors of covariance matrix. Sort eigenvectors by decreasing eigenvalues.

Projection

Project original data onto selected top-k eigenvectors. Result: reduced dimensionality data representation.

1. Given data matrix X (m samples × n features)2. Compute mean vector μ = (1/m) Σ X_i3. Center data: X_c = X - μ4. Compute covariance matrix C = (1/(m-1)) X_cᵀ X_c5. Compute eigenvalues λ and eigenvectors v of C6. Sort eigenvectors by decreasing λ7. Select top-k eigenvectors V_k8. Project data: Y = X_c V_k

Covariance Matrix

Definition

Symmetric matrix capturing pairwise covariances of features. Entry C_ij = cov(feature_i, feature_j).

Properties

Symmetric, positive semi-definite. Dimension: n × n. Diagonal elements: variances.

Role in PCA

Used to find principal directions as eigenvectors. Encodes data variance and relationships.

Covariance Matrix ExampleFeature 1Feature 2Feature 3
Feature 1σ₁²cov₁₂cov₁₃
Feature 2cov₂₁σ₂²cov₂₃
Feature 3cov₃₁cov₃₂σ₃²

Eigen Decomposition

Concept

Decompose covariance matrix C into eigenvalues λ and eigenvectors v: C v = λ v.

Significance

Eigenvectors: directions of maximal variance. Eigenvalues: variance magnitude along eigenvectors.

Computation Methods

Analytical for small matrices. Numerical methods (Power iteration, QR algorithm) for large matrices.

C v_i = λ_i v_iwhere:C = covariance matrix (n×n)v_i = i-th eigenvector (n×1)λ_i = i-th eigenvalue (scalar)

Principal Components

Definition

New variables obtained by projecting data onto eigenvectors. Orthogonal and uncorrelated.

Ordering

Ranked by explained variance (eigenvalues). First PC explains most variance, second PC next, etc.

Interpretation

Each PC is a linear combination of original features. Components capture underlying data structure.

Dimensionality Reduction

Rationale

Reduce features while retaining maximal variance. Simplifies models, reduces noise, improves visualization.

Component Selection

Choose top-k PCs based on cumulative explained variance threshold (e.g., 95%).

Reconstruction Error

Difference between original and projected data. Lower error with more components.

Number of Components (k)Explained Variance (%)Reconstruction Error
160High
285Moderate
395Low

Applications

Data Visualization

Project high-dimensional data to 2D/3D for visual inspection and pattern recognition.

Noise Reduction

Filter out components with low variance to remove noise and improve signal quality.

Feature Extraction

Generate new uncorrelated features for downstream tasks like clustering or classification.

Compression

Reduce data storage and transmission requirements by encoding with fewer dimensions.

Advantages and Limitations

Advantages

Simplicity: easy to implement. Efficiency: reduces computational cost. Interpretation: identifies key variance directions.

Limitations

Linearity: cannot capture nonlinear relationships. Sensitivity: affected by scaling and outliers. Interpretability: components may lack clear meaning.

Mitigations

Data normalization, robust PCA variants, kernel PCA for nonlinear structures.

Comparison with Other Methods

Kernel PCA

Extension to capture nonlinear structures using kernel functions.

Independent Component Analysis (ICA)

Focuses on statistical independence rather than variance. Used in signal separation.

Factor Analysis

Models data variance via latent factors including noise, probabilistic approach.

t-SNE and UMAP

Nonlinear dimensionality reduction for visualization; preserve local structure.

Implementation Considerations

Data Preprocessing

Centering mandatory. Scaling recommended if features differ in units or scale.

Computational Complexity

Eigen decomposition: O(n³) for covariance matrix of size n. Alternatives: randomized PCA for large data.

Software Libraries

Common implementations: scikit-learn (Python), MATLAB, R (prcomp), TensorFlow.

Parameter Choices

Number of components k: tradeoff between compression and information loss.

Extensions and Variants

Kernel PCA

Maps data to high-dimensional feature space through kernels, captures nonlinear patterns.

Sparse PCA

Produces components with sparse loadings, improves interpretability.

Robust PCA

Decomposes data into low-rank and sparse matrices, resilient to outliers.

Incremental PCA

Processes data in mini-batches for large-scale or streaming data.

Probabilistic PCA

Probabilistic model offering maximum likelihood estimation, handles missing data.

References

  • K. Pearson, "On lines and planes of closest fit to systems of points in space," Philosophical Magazine, vol. 2, 1901, pp. 559-572.
  • I. T. Jolliffe, "Principal Component Analysis," Springer Series in Statistics, 2nd ed., 2002.
  • H. Abdi and L. J. Williams, "Principal component analysis," Wiley Interdisciplinary Reviews: Computational Statistics, vol. 2, 2010, pp. 433-459.
  • B. Schölkopf, A. Smola, and K.-R. Müller, "Nonlinear component analysis as a kernel eigenvalue problem," Neural Computation, vol. 10, 1998, pp. 1299-1319.
  • J. W. Shlens, "A Tutorial on Principal Component Analysis," arXiv:1404.1100, 2014.