R Squared | What's Your IQ

Definition and Interpretation

What is R Squared?

Coefficient of determination. Measures proportion of variance in dependent variable explained by independent variables. Indicator of model fit quality. Values range 0 to 1, sometimes expressed as percentage.

Historical Context

Introduced in 1897 by Karl Pearson. Extended by Francis Galton in regression context. Fundamental in linear regression diagnostics.

Conceptual Meaning

Explained variance ratio. Quantifies how much variability in outcome is captured by predictors. Higher values indicate better explanatory power.

Calculation and Formula

Basic Formula

Ratio of explained sum of squares to total sum of squares. Denotes fraction of variance accounted for by model.

R² = 1 - (SS_res / SS_tot)where:SS_res = Σ(yᵢ - ŷᵢ)² (Residual Sum of Squares)SS_tot = Σ(yᵢ - ȳ)² (Total Sum of Squares)yᵢ = observed valuesŷᵢ = predicted valuesȳ = mean of observed values

Alternate Expression

Also expressed as square of Pearson correlation coefficient in simple linear regression.

R² = (correlation(y, ŷ))²

Components Explained

SS_tot: total variability in data. SS_res: variability unexplained by model. SS_reg = SS_tot - SS_res: variability explained by regression.

Term	Definition
SS_tot	Total Sum of Squares, total variance in observed data
SS_res	Residual Sum of Squares, variance unexplained by model
SS_reg	Regression Sum of Squares, variance explained by model

Interpretation of Values

Value Range

Ranges between 0 and 1. 0 means model explains none of variance. 1 means perfect fit, all variance explained.

Common Thresholds

Values above 0.7 indicate good fit in many fields. Values below 0.3 suggest weak model explanatory power. Context-dependent.

Percentage Explanation

R² × 100 gives percentage of variance explained by model. Example: R²=0.85 means 85% variance explained.

Types of R Squared

Simple R Squared

Derived from simple linear regression with one predictor. Equals square of correlation coefficient between observed and predicted.

Multiple R Squared

Used in multiple regression with several predictors. Measures collective explanatory power.

Pseudo R Squared

Used in models without least squares estimation, e.g. logistic regression. Various definitions exist (McFadden, Cox & Snell).

Limitations and Misinterpretations

Overfitting Susceptibility

Increases with additional predictors regardless of relevance. Can be misleading without adjustment.

Does Not Imply Causation

High R² does not confirm causal relationship. Only measures association strength.

Not Absolute Measure of Model Quality

Must be complemented by residual analysis, significance tests, and domain knowledge.

Adjusted R Squared

Purpose

Corrects R² for number of predictors and sample size. Penalizes model complexity.

Formula

Adjusted R² = 1 - [(1 - R²)(n - 1) / (n - p - 1)]where:n = number of observationsp = number of predictors

Interpretation

May decrease if new predictors do not improve model. Preferred over raw R² for model comparison.

R Squared vs Correlation Coefficient

Relationship

In simple linear regression, R² equals square of Pearson correlation coefficient (r).

Differences in Multiple Regression

Correlation coefficient not defined between observed and predicted vectors directly. R² generalizes explained variance.

Interpretation Nuances

Correlation measures linear association; R² measures explained variance fraction. Both important, different roles.

Use Cases in Regression Analysis

Model Evaluation

Primary metric for assessing goodness of fit. Guides model refinement and variable selection.

Predictive Accuracy

Indicates expected predictive performance on similar data. Useful in forecasting and risk modeling.

Comparing Models

Used to compare nested and non-nested models. Adjusted R² preferred for penalized comparison.

Computation Examples

Simple Linear Regression Example

Dataset: observed y = [3, 5, 7, 9], predicted ŷ = [2.8, 5.1, 6.9, 9.2]

Calculate:SS_tot = Σ(yᵢ - ȳ)² = (3-6)^2 + (5-6)^2 + (7-6)^2 + (9-6)^2 = 9 + 1 + 1 + 9 = 20SS_res = Σ(yᵢ - ŷᵢ)² = (3-2.8)^2 + (5-5.1)^2 + (7-6.9)^2 + (9-9.2)^2 = 0.04 + 0.01 + 0.01 + 0.04 = 0.10R² = 1 - (0.10 / 20) = 1 - 0.005 = 0.995

Multiple Regression Example

Model with predictors X1, X2 predicting y. SS_tot = 100, SS_res = 25.

R² = 1 - (25 / 100) = 0.75 (75% variance explained)

Example	R² Value	Interpretation
Simple Regression	0.995	Excellent fit, nearly all variance explained
Multiple Regression	0.75	Good fit, substantial variance explained

Software Implementations

R

Function summary(lm()) outputs R² and adjusted R² automatically. Package caret and others compute R² for different models.

Python

Scikit-learn's LinearRegression().score() method returns R². Statsmodels provides detailed regression summaries including R².

SPSS and SAS

Regression procedures output R², adjusted R², and related diagnostics by default.

References

Freedman, D. A. "Statistical Models: Theory and Practice." Cambridge University Press, 2009, pp. 120-145.
Kutner, M. H., Nachtsheim, C. J., Neter, J., & Li, W. "Applied Linear Statistical Models." 5th ed., McGraw-Hill, 2005, pp. 350-380.
Gelman, A., & Hill, J. "Data Analysis Using Regression and Multilevel/Hierarchical Models." Cambridge University Press, 2007, pp. 45-60.
Hosmer, D. W., Lemeshow, S., & Sturdivant, R. X. "Applied Logistic Regression." 3rd ed., Wiley, 2013, pp. 159-180.
Montgomery, D. C., Peck, E. A., & Vining, G. G. "Introduction to Linear Regression Analysis." 5th ed., Wiley, 2012, pp. 50-75.

Definition and Interpretation

What is R Squared?

Historical Context

Conceptual Meaning

Calculation and Formula

Basic Formula

Alternate Expression

Components Explained

Interpretation of Values

Value Range

Common Thresholds

Percentage Explanation

Types of R Squared

Simple R Squared

Multiple R Squared

Pseudo R Squared

Limitations and Misinterpretations

Overfitting Susceptibility

Does Not Imply Causation

Not Absolute Measure of Model Quality

Adjusted R Squared

Purpose

Formula

Interpretation

R Squared vs Correlation Coefficient

Relationship

Differences in Multiple Regression

Interpretation Nuances

Use Cases in Regression Analysis

Model Evaluation

Predictive Accuracy

Comparing Models

Computation Examples

Simple Linear Regression Example

Multiple Regression Example

Software Implementations

R

Python

SPSS and SAS

Related Metrics and Alternatives

Adjusted R Squared

Root Mean Squared Error (RMSE)

Mean Absolute Error (MAE)

Pseudo R Squared for Nonlinear Models

References