Definition and Origin
Historical Background
Developed by William Sealy Gosset (pseudonym "Student") in 1908. Designed for small sample inference where population variance unknown. Published under pseudonym due to employer's policy.
Definition
Continuous probability distribution used to estimate population parameters when sample size is small and variance unknown. Also called Student’s t distribution.
Context of Use
Replaces normal distribution when population variance is estimated from sample. Common in hypothesis testing, confidence intervals, and regression analysis.
Mathematical Properties
Distribution Type
Symmetric, bell-shaped, continuous distribution. Defined for real numbers from -∞ to +∞.
Parameters
Characterized by degrees of freedom (ν). Shape and kurtosis depend on ν. No location or scale parameters explicitly, but often scaled by sample variance.
Moments
Mean: 0 for ν > 1. Variance: ν/(ν−2) for ν > 2. Skewness: 0 (symmetric). Excess kurtosis: 6/(ν−4) for ν > 4.
Degrees of Freedom
Definition
Number of independent values free to vary in calculation. For t distribution, typically ν = n − 1 where n is sample size.
Role in Shape
Lower ν: heavier tails, more variability. Higher ν: distribution approaches normal distribution.
Interpretation
Reflects uncertainty in sample variance estimate. Controls tail thickness and critical values.
Probability Density Function (PDF)
Formula
f(t) = Γ((ν+1)/2) / [√(νπ) Γ(ν/2)] × (1 + t²/ν)^(-(ν+1)/2)Components
Γ: gamma function. ν: degrees of freedom. t: variable.
Properties
Integrates to 1 over real line. Symmetric about zero. Tail probability decreases polynomially.
Cumulative Distribution Function (CDF)
Definition
Probability that t variable ≤ given value t₀. Calculated via incomplete beta functions or numerical methods.
Calculation Methods
Exact integrals complex; numerical approximation or software implementations common.
Use in Hypothesis Testing
Determines p-values and critical regions based on observed t statistics.
Relationship to Normal Distribution
Convergence
As ν → ∞, t distribution converges to standard normal distribution.
Differences
Heavier tails than normal for small ν. Reflects additional uncertainty in variance estimation.
Practical Implication
For large samples, normal approximations suffice. For small samples, t distribution preferred.
Applications in Statistics
Parameter Estimation
Used to estimate population mean when variance unknown and sample small.
Hypothesis Testing
Forms basis for one-sample, two-sample, paired t-tests.
Regression Analysis
Estimates significance of coefficients when error variance unknown.
Hypothesis Testing
Test Statistic
t = (x̄ − μ₀) / (s / √n)x̄: sample mean, μ₀: hypothesized mean, s: sample standard deviation, n: sample size.
Decision Rule
Compare calculated t to critical t value from tables at desired α level and degrees of freedom.
Types of Tests
One-tailed and two-tailed tests for mean differences.
Confidence Intervals
Formula
CI = x̄ ± t_{α/2, ν} × (s / √n)t_{α/2, ν}: critical t value for confidence level and degrees of freedom.
Interpretation
Range within which population mean lies with specified confidence.
Effect of Degrees of Freedom
Smaller ν widens interval due to heavier tails.
Sample Size Effects
Small Samples
t distribution essential for n < 30 due to variance uncertainty.
Large Samples
Approximates normal; t and z tests yield similar results.
Practical Guidance
Use t distribution unless sample size large and variance known.
Table of Critical Values
Overview
Critical values depend on α level and degrees of freedom. Used to determine rejection regions in hypothesis tests.
Example Table
| Degrees of Freedom (ν) | t (α=0.05, two-tailed) | t (α=0.01, two-tailed) |
|---|---|---|
| 1 | 12.706 | 63.657 |
| 5 | 2.571 | 4.032 |
| 10 | 2.228 | 3.169 |
| 30 | 2.042 | 2.750 |
| ∞ (Normal) | 1.960 | 2.576 |
Usage Notes
Interpolate for non-tabulated degrees of freedom. Software automates critical value retrieval.
Limitations and Assumptions
Assumptions
Random sampling. Normality of underlying population. Independence of observations.
Limitations
Not robust to severe non-normality. Requires accurate estimate of sample variance. Less effective for very small samples if assumptions violated.
Alternatives
Bootstrap methods, nonparametric tests for non-normal data or dependent samples.
References
- Gosset, W. S., "The probable error of a mean," Biometrika, vol. 6, 1908, pp. 1–25.
- Student, "The probable error of a mean," Biometrika, vol. 6, no. 1, 1908, pp. 1–25.
- DeGroot, M. H., & Schervish, M. J., "Probability and Statistics," 4th ed., Pearson, 2012, pp. 200–210.
- Casella, G., & Berger, R. L., "Statistical Inference," 2nd ed., Duxbury Press, 2002, pp. 250–270.
- Johnson, N. L., Kotz, S., & Balakrishnan, N., "Continuous Univariate Distributions," Vol. 2, Wiley, 1995, pp. 85–105.