Definition of Variance
Concept
Variance quantifies the average squared deviation of data points from the mean. It reflects data spread and dispersion within a dataset.
Mathematical Meaning
Variance measures how far each number in the set is from the mean and thus from every other number in the set.
Context in Descriptive Statistics
Variance is a primary measure of dispersion used alongside mean, median, and mode to describe data distribution characteristics.
Variance Formulas
Population Variance Formula
Population variance (σ²) calculates variance using all data points in the population.
σ² = (1/N) ∑ (xᵢ - μ)²Sample Variance Formula
Sample variance (s²) estimates population variance from a sample, applying Bessel’s correction.
s² = (1/(n-1)) ∑ (xᵢ - x̄)²Symbols Explained
σ²: population variance, s²: sample variance, N: population size, n: sample size, xᵢ: data point, μ: population mean, x̄: sample mean.
Population vs Sample Variance
Population Variance
Measures true variance of entire population. Uses divisor N. Exact measure if population data known.
Sample Variance
Estimates population variance from sample data. Uses divisor n-1 to correct bias (Bessel’s correction).
Bias and Unbiased Estimation
Sample variance is unbiased estimator of population variance; dividing by n underestimates variance.
Properties of Variance
Non-negativity
Variance is always ≥ 0. Zero variance indicates all data points identical.
Units
Variance units are square of original data units (e.g., meters²). Limits direct interpretability.
Effect of Linear Transformations
Variance scales by square of multiplicative constant: Var(aX + b) = a² Var(X).
Calculation Methods
Direct Method
Calculate mean, then average squared deviations from mean.
Computational Shortcut
Use formula: Var = (∑xᵢ² - (∑xᵢ)²/n) / (n-1) for samples to reduce rounding errors.
Software and Tools
Statistical software (R, SPSS, Python) computes variance automatically with built-in functions.
| Method | Description |
|---|---|
| Direct Method | Calculate mean, then squared deviations, then average |
| Computational Shortcut | Use sum of squares and sum of values to avoid repeated subtraction |
Interpretation of Variance
Measure of Dispersion
High variance: data widely spread; low variance: data clustered near mean.
Contextual Meaning
Variance must be interpreted relative to mean and units; raw magnitude alone insufficient.
Relation to Data Consistency
Lower variance implies higher consistency or reliability in data values.
Applications of Variance
Statistical Modeling
Variance essential in regression analysis, ANOVA, hypothesis testing, and probability distributions.
Risk Assessment
Finance uses variance to measure volatility and risk of investment returns.
Quality Control
Variance monitors process variability and product consistency in manufacturing.
Limitations of Variance
Units Squared
Variance units squared reduce intuitive interpretability compared to original units.
Sensitivity to Outliers
Variance disproportionately affected by extreme values due to squaring deviations.
Non-Robustness
Not appropriate for highly skewed distributions without transformation or robust alternatives.
Variance vs Standard Deviation
Definition
Variance: average squared deviation. Standard deviation: square root of variance.
Units Comparison
Standard deviation shares same units as data; variance units squared.
Preferred Usage
Standard deviation preferred for interpretation; variance used in theoretical derivations.
Computational Formula
Sample Variance Shortcut Formula
s² = [ ∑xᵢ² - ( (∑xᵢ)² / n ) ] / (n - 1)Derivation
Reduces computational steps by avoiding repeated subtraction of mean.
Implementation Example
Calculate sums and sums of squares, then apply formula for efficient variance calculation.
Worked Examples
Example 1: Population Variance
Data: 2, 4, 6, 8, 10; N=5; μ=6.
Variance: σ² = (1/5)[(2-6)² + (4-6)² + (6-6)² + (8-6)² + (10-6)²] = (1/5)(16 + 4 + 0 + 4 + 16) = 8.
Example 2: Sample Variance
Data: 3, 7, 7, 19; n=4; x̄=9.
Variance: s² = (1/3)[(3-9)² + (7-9)² + (7-9)² + (19-9)²] = (1/3)(36 + 4 + 4 + 100) = 48.
| Value (xᵢ) | Deviation (xᵢ - x̄) | Squared Deviation |
|---|---|---|
| 3 | -6 | 36 |
| 7 | -2 | 4 |
| 7 | -2 | 4 |
| 19 | 10 | 100 |
References
- Wackerly, D., Mendenhall, W., & Scheaffer, R. L. "Mathematical Statistics with Applications," 7th ed., Brooks/Cole, 2008, pp. 124-150.
- Rice, J. A. "Mathematical Statistics and Data Analysis," 3rd ed., Cengage Learning, 2006, pp. 60-85.
- Moore, D. S., McCabe, G. P., & Craig, B. A. "Introduction to the Practice of Statistics," 7th ed., W. H. Freeman, 2012, pp. 110-130.
- DeGroot, M. H., & Schervish, M. J. "Probability and Statistics," 4th ed., Addison-Wesley, 2012, pp. 95-120.
- Casella, G., & Berger, R. L. "Statistical Inference," 2nd ed., Duxbury, 2002, pp. 200-225.