Chi Squared | What's Your IQ

Definition

Chi Squared Distribution

Definition: The chi squared (χ²) distribution is a continuous probability distribution. It arises as the sum of squares of k independent standard normal random variables. Parameter: degrees of freedom (k), a positive integer. Support: x ∈ [0, ∞).

Chi Squared Statistic

Statistic: sum of squared standardized deviations between observed and expected frequencies. Use: hypothesis testing, model fit assessment. Notation: χ².

Historical Context

Developed by Karl Pearson (1900) for goodness-of-fit tests. Foundation for categorical data inference. Integral in frequentist statistics.

Properties

Moments

Mean: E(χ²) = k. Variance: Var(χ²) = 2k. Skewness: sqrt(8/k). Kurtosis: 12/k. Increasing k reduces skewness.

Shape

Shape: positively skewed, unimodal. As k → ∞, distribution approaches normality (by CLT). Mode: max(k-2, 0).

Support and Range

Support: non-negative real numbers. Range: [0, ∞). Distribution is right-tailed, no negative values.

Distribution Function

Probability Density Function (PDF)

PDF formula:

f(x; k) = (1 / (2^(k/2) * Γ(k/2))) * x^(k/2 - 1) * e^(-x/2), x > 0

Cumulative Distribution Function (CDF)

CDF expressed via lower incomplete gamma function:

F(x; k) = γ(k/2, x/2) / Γ(k/2)

Moment Generating Function (MGF)

MGF: M(t) = (1 - 2t)^(-k/2), for t < 1/2.

Derivation

From Normal Variables

Sum of squares of k independent standard normal variables Z_i: χ² = Σ Z_i². Distribution defined by this quadratic form.

Gamma Distribution Relation

Chi squared is a special case of Gamma distribution: χ²(k) ~ Gamma(α = k/2, β = 2).

Characteristic Function

Characteristic function: φ(t) = (1 - 2it)^(-k/2). Used to confirm distribution properties and moments.

Applications

Hypothesis Testing

Tests: goodness-of-fit, independence in contingency tables, homogeneity tests. Test statistic follows χ² under null hypothesis.

Confidence Intervals

Used to construct confidence intervals for variance of normal populations. Inference about variance via χ² quantiles.

Model Assessment

Measure fit of statistical models. Large χ² values indicate poor fit. Used in logistic regression diagnostics.

Goodness of Fit Test

Test Statistic

Formula: χ² = Σ (O_i - E_i)² / E_i, where O_i = observed frequency, E_i = expected frequency.

Null Hypothesis

H₀: observed frequencies conform to specified distribution. Reject H₀ if χ² exceeds critical value.

Decision Rule

Compare calculated χ² to critical value from χ² distribution with appropriate degrees of freedom. p-value interpretation.

Contingency Tables

Test of Independence

Analyze association between categorical variables. Calculate expected frequencies under independence assumption.

Test Statistic Calculation

χ² = Σ (O_ij - E_ij)² / E_ij, summing over all cells i,j. O_ij = observed, E_ij = expected frequencies.

Degrees of Freedom

Degrees of freedom: (r - 1)(c - 1), where r = rows, c = columns in table.

Degrees of Freedom

Definition

Number of independent components in calculation of χ² statistic. Depends on sample size and constraints.

Calculation Examples

Goodness of fit: df = number of categories - 1 - number of estimated parameters. Contingency tables: df = (rows-1)(columns-1).

Effect on Distribution

df controls shape, scale. Larger df shifts distribution right, reduces skewness, approaches normality.

Assumptions

Sample Size

Expected frequencies should be sufficiently large (commonly ≥ 5) to ensure approximation validity.

Independence

Observations must be independent to avoid bias in χ² statistic distribution.

Fixed Margins

In contingency tables, marginal totals are often treated as fixed for calculation purposes.

Limitations

Small Sample Size

Chi squared approximation unreliable for small expected counts; alternative exact tests preferred.

Continuous Variables

Not applicable directly to continuous data; requires categorization which may cause information loss.

Zero Counts

Cells with zero observed or expected frequencies invalidate assumptions, complicating test.

Calculation and Tables

Critical Values

Critical values depend on significance level α and degrees of freedom k. Used to reject/accept H₀.

Example Critical Values Table

Degrees of Freedom	χ² Critical Value (α=0.05)	χ² Critical Value (α=0.01)
1	3.841	6.635
2	5.991	9.210
5	11.070	15.086
10	18.307	23.209

Calculation Procedure

1. Calculate observed frequencies O_i.2. Calculate expected frequencies E_i under H₀.3. Compute χ² = Σ (O_i - E_i)² / E_i.4. Determine degrees of freedom k.5. Find critical value χ²_crit for α, k.6. Compare χ² to χ²_crit: reject H₀ if χ² > χ²_crit.

Examples

Goodness-of-Fit Example

Test if a die is fair. Observed frequencies from 60 rolls: 8, 10, 9, 12, 11, 10. Expected frequency per face: 10.

Calculate χ²:

χ² = Σ (O_i - 10)² / 10= (8-10)²/10 + (10-10)²/10 + (9-10)²/10 + (12-10)²/10 + (11-10)²/10 + (10-10)²/10= 0.4 + 0 + 0.1 + 0.4 + 0.1 + 0= 1.0

Degrees of freedom: 6 - 1 = 5.

Critical value at α=0.05: 11.070. Result: 1.0 < 11.070, fail to reject H₀; die likely fair.

Contingency Table Example

Test independence between gender and voting preference. Table:

	Party A	Party B	Total
Male	30	20	50
Female	20	30	50
Total	50	50	100

Expected frequencies:

E_male_A = (50*50)/100 = 25E_male_B = 25E_female_A = 25E_female_B = 25χ² = Σ (O_ij - E_ij)² / E_ij= (30-25)²/25 + (20-25)²/25 + (20-25)²/25 + (30-25)²/25= 1 + 1 + 1 + 1 = 4Degrees of freedom = (2-1)(2-1) = 1Critical value at α=0.05 = 3.8414 > 3.841 → reject H₀; association likely present.

References

K. Pearson, "On the χ² Test of Goodness of Fit," Biometrika, vol. 14, no. 1-2, 1900, pp. 186-191.
G.E.P. Box, "Some Theorems on Quadratic Forms Applied in the Study of Analysis of Variance Problems," Annals of Mathematical Statistics, vol. 10, 1939, pp. 76-84.
W. Feller, "An Introduction to Probability Theory and Its Applications," Wiley, vol. 2, 1971, pp. 200-210.
J. Agresti, "Categorical Data Analysis," Wiley, 3rd ed., 2013, pp. 110-145.
R. A. Fisher, "Statistical Methods for Research Workers," Oliver & Boyd, 1925, pp. 45-70.