Definition
Chi Squared Distribution
Definition: The chi squared (χ²) distribution is a continuous probability distribution. It arises as the sum of squares of k independent standard normal random variables. Parameter: degrees of freedom (k), a positive integer. Support: x ∈ [0, ∞).
Chi Squared Statistic
Statistic: sum of squared standardized deviations between observed and expected frequencies. Use: hypothesis testing, model fit assessment. Notation: χ².
Historical Context
Developed by Karl Pearson (1900) for goodness-of-fit tests. Foundation for categorical data inference. Integral in frequentist statistics.
Properties
Moments
Mean: E(χ²) = k. Variance: Var(χ²) = 2k. Skewness: sqrt(8/k). Kurtosis: 12/k. Increasing k reduces skewness.
Shape
Shape: positively skewed, unimodal. As k → ∞, distribution approaches normality (by CLT). Mode: max(k-2, 0).
Support and Range
Support: non-negative real numbers. Range: [0, ∞). Distribution is right-tailed, no negative values.
Distribution Function
Probability Density Function (PDF)
PDF formula:
f(x; k) = (1 / (2^(k/2) * Γ(k/2))) * x^(k/2 - 1) * e^(-x/2), x > 0Cumulative Distribution Function (CDF)
CDF expressed via lower incomplete gamma function:
F(x; k) = γ(k/2, x/2) / Γ(k/2)Moment Generating Function (MGF)
MGF: M(t) = (1 - 2t)^(-k/2), for t < 1/2.
Derivation
From Normal Variables
Sum of squares of k independent standard normal variables Z_i: χ² = Σ Z_i². Distribution defined by this quadratic form.
Gamma Distribution Relation
Chi squared is a special case of Gamma distribution: χ²(k) ~ Gamma(α = k/2, β = 2).
Characteristic Function
Characteristic function: φ(t) = (1 - 2it)^(-k/2). Used to confirm distribution properties and moments.
Applications
Hypothesis Testing
Tests: goodness-of-fit, independence in contingency tables, homogeneity tests. Test statistic follows χ² under null hypothesis.
Confidence Intervals
Used to construct confidence intervals for variance of normal populations. Inference about variance via χ² quantiles.
Model Assessment
Measure fit of statistical models. Large χ² values indicate poor fit. Used in logistic regression diagnostics.
Goodness of Fit Test
Test Statistic
Formula: χ² = Σ (O_i - E_i)² / E_i, where O_i = observed frequency, E_i = expected frequency.
Null Hypothesis
H₀: observed frequencies conform to specified distribution. Reject H₀ if χ² exceeds critical value.
Decision Rule
Compare calculated χ² to critical value from χ² distribution with appropriate degrees of freedom. p-value interpretation.
Contingency Tables
Test of Independence
Analyze association between categorical variables. Calculate expected frequencies under independence assumption.
Test Statistic Calculation
χ² = Σ (O_ij - E_ij)² / E_ij, summing over all cells i,j. O_ij = observed, E_ij = expected frequencies.
Degrees of Freedom
Degrees of freedom: (r - 1)(c - 1), where r = rows, c = columns in table.
Degrees of Freedom
Definition
Number of independent components in calculation of χ² statistic. Depends on sample size and constraints.
Calculation Examples
Goodness of fit: df = number of categories - 1 - number of estimated parameters. Contingency tables: df = (rows-1)(columns-1).
Effect on Distribution
df controls shape, scale. Larger df shifts distribution right, reduces skewness, approaches normality.
Assumptions
Sample Size
Expected frequencies should be sufficiently large (commonly ≥ 5) to ensure approximation validity.
Independence
Observations must be independent to avoid bias in χ² statistic distribution.
Fixed Margins
In contingency tables, marginal totals are often treated as fixed for calculation purposes.
Limitations
Small Sample Size
Chi squared approximation unreliable for small expected counts; alternative exact tests preferred.
Continuous Variables
Not applicable directly to continuous data; requires categorization which may cause information loss.
Zero Counts
Cells with zero observed or expected frequencies invalidate assumptions, complicating test.
Calculation and Tables
Critical Values
Critical values depend on significance level α and degrees of freedom k. Used to reject/accept H₀.
Example Critical Values Table
| Degrees of Freedom | χ² Critical Value (α=0.05) | χ² Critical Value (α=0.01) |
|---|---|---|
| 1 | 3.841 | 6.635 |
| 2 | 5.991 | 9.210 |
| 5 | 11.070 | 15.086 |
| 10 | 18.307 | 23.209 |
Calculation Procedure
1. Calculate observed frequencies O_i.2. Calculate expected frequencies E_i under H₀.3. Compute χ² = Σ (O_i - E_i)² / E_i.4. Determine degrees of freedom k.5. Find critical value χ²_crit for α, k.6. Compare χ² to χ²_crit: reject H₀ if χ² > χ²_crit.Examples
Goodness-of-Fit Example
Test if a die is fair. Observed frequencies from 60 rolls: 8, 10, 9, 12, 11, 10. Expected frequency per face: 10.
Calculate χ²:
χ² = Σ (O_i - 10)² / 10= (8-10)²/10 + (10-10)²/10 + (9-10)²/10 + (12-10)²/10 + (11-10)²/10 + (10-10)²/10= 0.4 + 0 + 0.1 + 0.4 + 0.1 + 0= 1.0Degrees of freedom: 6 - 1 = 5.
Critical value at α=0.05: 11.070. Result: 1.0 < 11.070, fail to reject H₀; die likely fair.
Contingency Table Example
Test independence between gender and voting preference. Table:
| Party A | Party B | Total | |
|---|---|---|---|
| Male | 30 | 20 | 50 |
| Female | 20 | 30 | 50 |
| Total | 50 | 50 | 100 |
Expected frequencies:
E_male_A = (50*50)/100 = 25E_male_B = 25E_female_A = 25E_female_B = 25χ² = Σ (O_ij - E_ij)² / E_ij= (30-25)²/25 + (20-25)²/25 + (20-25)²/25 + (30-25)²/25= 1 + 1 + 1 + 1 = 4Degrees of freedom = (2-1)(2-1) = 1Critical value at α=0.05 = 3.8414 > 3.841 → reject H₀; association likely present.References
- K. Pearson, "On the χ² Test of Goodness of Fit," Biometrika, vol. 14, no. 1-2, 1900, pp. 186-191.
- G.E.P. Box, "Some Theorems on Quadratic Forms Applied in the Study of Analysis of Variance Problems," Annals of Mathematical Statistics, vol. 10, 1939, pp. 76-84.
- W. Feller, "An Introduction to Probability Theory and Its Applications," Wiley, vol. 2, 1971, pp. 200-210.
- J. Agresti, "Categorical Data Analysis," Wiley, 3rd ed., 2013, pp. 110-145.
- R. A. Fisher, "Statistical Methods for Research Workers," Oliver & Boyd, 1925, pp. 45-70.