Definition and Origin

Historical Background

Developed by William Sealy Gosset (pseudonym "Student") in 1908. Designed for small sample inference where population variance unknown. Published under pseudonym due to employer's policy.

Definition

Continuous probability distribution used to estimate population parameters when sample size is small and variance unknown. Also called Student’s t distribution.

Context of Use

Replaces normal distribution when population variance is estimated from sample. Common in hypothesis testing, confidence intervals, and regression analysis.

Mathematical Properties

Distribution Type

Symmetric, bell-shaped, continuous distribution. Defined for real numbers from -∞ to +∞.

Parameters

Characterized by degrees of freedom (ν). Shape and kurtosis depend on ν. No location or scale parameters explicitly, but often scaled by sample variance.

Moments

Mean: 0 for ν > 1. Variance: ν/(ν−2) for ν > 2. Skewness: 0 (symmetric). Excess kurtosis: 6/(ν−4) for ν > 4.

Degrees of Freedom

Definition

Number of independent values free to vary in calculation. For t distribution, typically ν = n − 1 where n is sample size.

Role in Shape

Lower ν: heavier tails, more variability. Higher ν: distribution approaches normal distribution.

Interpretation

Reflects uncertainty in sample variance estimate. Controls tail thickness and critical values.

Probability Density Function (PDF)

Formula

f(t) = Γ((ν+1)/2) / [√(νπ) Γ(ν/2)] × (1 + t²/ν)^(-(ν+1)/2)

Components

Γ: gamma function. ν: degrees of freedom. t: variable.

Properties

Integrates to 1 over real line. Symmetric about zero. Tail probability decreases polynomially.

Cumulative Distribution Function (CDF)

Definition

Probability that t variable ≤ given value t₀. Calculated via incomplete beta functions or numerical methods.

Calculation Methods

Exact integrals complex; numerical approximation or software implementations common.

Use in Hypothesis Testing

Determines p-values and critical regions based on observed t statistics.

Relationship to Normal Distribution

Convergence

As ν → ∞, t distribution converges to standard normal distribution.

Differences

Heavier tails than normal for small ν. Reflects additional uncertainty in variance estimation.

Practical Implication

For large samples, normal approximations suffice. For small samples, t distribution preferred.

Applications in Statistics

Parameter Estimation

Used to estimate population mean when variance unknown and sample small.

Hypothesis Testing

Forms basis for one-sample, two-sample, paired t-tests.

Regression Analysis

Estimates significance of coefficients when error variance unknown.

Hypothesis Testing

Test Statistic

t = (x̄ − μ₀) / (s / √n)

x̄: sample mean, μ₀: hypothesized mean, s: sample standard deviation, n: sample size.

Decision Rule

Compare calculated t to critical t value from tables at desired α level and degrees of freedom.

Types of Tests

One-tailed and two-tailed tests for mean differences.

Confidence Intervals

Formula

CI = x̄ ± t_{α/2, ν} × (s / √n)

t_{α/2, ν}: critical t value for confidence level and degrees of freedom.

Interpretation

Range within which population mean lies with specified confidence.

Effect of Degrees of Freedom

Smaller ν widens interval due to heavier tails.

Sample Size Effects

Small Samples

t distribution essential for n < 30 due to variance uncertainty.

Large Samples

Approximates normal; t and z tests yield similar results.

Practical Guidance

Use t distribution unless sample size large and variance known.

Table of Critical Values

Overview

Critical values depend on α level and degrees of freedom. Used to determine rejection regions in hypothesis tests.

Example Table

Degrees of Freedom (ν)t (α=0.05, two-tailed)t (α=0.01, two-tailed)
112.70663.657
52.5714.032
102.2283.169
302.0422.750
∞ (Normal)1.9602.576

Usage Notes

Interpolate for non-tabulated degrees of freedom. Software automates critical value retrieval.

Limitations and Assumptions

Assumptions

Random sampling. Normality of underlying population. Independence of observations.

Limitations

Not robust to severe non-normality. Requires accurate estimate of sample variance. Less effective for very small samples if assumptions violated.

Alternatives

Bootstrap methods, nonparametric tests for non-normal data or dependent samples.

References

  • Gosset, W. S., "The probable error of a mean," Biometrika, vol. 6, 1908, pp. 1–25.
  • Student, "The probable error of a mean," Biometrika, vol. 6, no. 1, 1908, pp. 1–25.
  • DeGroot, M. H., & Schervish, M. J., "Probability and Statistics," 4th ed., Pearson, 2012, pp. 200–210.
  • Casella, G., & Berger, R. L., "Statistical Inference," 2nd ed., Duxbury Press, 2002, pp. 250–270.
  • Johnson, N. L., Kotz, S., & Balakrishnan, N., "Continuous Univariate Distributions," Vol. 2, Wiley, 1995, pp. 85–105.