Definition and Properties
Overview
Normal distribution: continuous, symmetric, unimodal. Shape: bell curve. Defined by mean (μ) and variance (σ²). Support: all real numbers (-∞, ∞). Model: natural variation, measurement errors, biological traits.
Mathematical Definition
Random variable X is normal if its pdf matches the Gaussian formula. Properties: symmetry about mean, mode = median = mean, infinite support, tails asymptotically approach zero.
Key Properties
Unimodality: single peak at mean μ. Symmetry: f(μ - x) = f(μ + x). Moments: all moments finite. Characteristic: max entropy for given mean and variance.
Probability Density Function and CDF
Probability Density Function (PDF)
PDF formula:
f(x) = (1 / (σ √(2π))) * exp(-(x - μ)² / (2σ²))Interpretation: height of curve at x; area under curve between points = probability.
Cumulative Distribution Function (CDF)
CDF: probability X ≤ x. No closed form, uses error function erf. Expressed as:
F(x) = 0.5 * [1 + erf((x - μ) / (σ √2))]Properties of PDF and CDF
PDF integrates to 1. CDF non-decreasing, continuous, limits 0 at -∞, 1 at ∞. Symmetry: F(μ + a) = 1 - F(μ - a).
| Property | Description |
|---|---|
| Support | (-∞, ∞) |
| Symmetry | About mean μ |
| Mean, Median, Mode | All equal to μ |
| Variance | σ² |
Parameters: Mean and Variance
Mean (μ)
Location parameter. Center of distribution. Determines peak position. Estimate: sample average.
Variance (σ²)
Scale parameter. Measures spread, dispersion. Controls width of bell curve. Estimate: sample variance.
Higher Moments
Skewness = 0 (symmetry). Kurtosis = 3 (mesokurtic). Moments beyond variance describe shape deviations; normal is baseline.
Standard Normal Distribution
Definition
Mean μ = 0, variance σ² = 1. Denoted Z ~ N(0,1). Basis for normalization and reference tables.
Standard Normal PDF and CDF
PDF:
ϕ(z) = (1 / √(2π)) * exp(-z² / 2)CDF:
Φ(z) = ∫ from -∞ to z of ϕ(t) dtUse in Statistical Tables
Tabulated Φ(z) values enable probability calculations. Software functions implement Φ and inverse Φ for quantiles.
Z-scores and Standardization
Definition
Z-score: number of standard deviations a value is from mean.
z = (x - μ) / σPurpose
Normalize data to standard normal. Compare values across different scales. Facilitate probability and hypothesis testing.
Example
Value x=80, μ=70, σ=5. z = (80-70)/5 = 2. Interpretation: x is 2 SD above mean.
Central Limit Theorem
Statement
Sum (or average) of large number of i.i.d. variables tends towards normal distribution, regardless of original distribution.
Conditions
Variables independent, identically distributed, finite mean and variance. Sample size sufficiently large (n ≥ 30 common rule).
Implications
Justifies normal models in statistics. Basis for confidence intervals, hypothesis testing, parametric inference.
Moment Generating Functions
Definition
MGF M(t) = E[e^(tX)]. For normal:
M(t) = exp(μt + (σ² t²)/2)Properties
MGF exists for all real t. Moments found by derivatives at t=0. Uniqueness: MGF determines distribution uniquely.
Characteristic Function
φ(t) = E[e^(itX)] = exp(iμt - (σ² t²)/2). Useful in theory and proofs.
Applications in Statistics and Science
Statistical Modeling
Model measurement errors, test statistics, regression residuals. Foundation of parametric statistical methods.
Natural Sciences
Model height, IQ scores, blood pressure, noise. Describes phenomena with aggregate independent effects.
Engineering and Finance
Signal processing noise, risk analysis, asset returns approximation. Provides tractable analytical tools.
Sampling and Estimation
Sampling Distribution
Sample mean of normal samples also normal. Variance decreases with sample size: σ²/n.
Confidence Intervals
Use normal quantiles to construct interval estimates of μ when σ known or large samples.
Parameter Estimation
Maximum likelihood estimators for μ and σ²: sample mean and sample variance.
| Estimator | Formula | Properties |
|---|---|---|
| Mean (μ̂) | (1/n) ∑ xᵢ | Unbiased, consistent |
| Variance (σ̂²) | (1/n) ∑ (xᵢ - μ̂)² | Biased, consistent; unbiased: divide by (n-1) |
Multivariate Normal Distribution
Definition
Vector X = (X₁, ..., Xₖ) jointly normal if any linear combination is normal.
Parameters
Mean vector μ (k×1), covariance matrix Σ (k×k, symmetric positive definite).
PDF Formula
f(x) = (1 / ((2π)^(k/2) |Σ|^(1/2))) * exp(-0.5 (x - μ)' Σ⁻¹ (x - μ))Applications
Multivariate modeling, principal component analysis, Bayesian inference, pattern recognition.
Limitations and Assumptions
Assumption of Normality
Many methods require normality; real data may deviate (skewness, kurtosis). Check with tests, plots.
Sensitivity to Outliers
Normal distribution sensitive to extreme values; robust alternatives sometimes preferred.
Non-Negative Data
Not suitable for strictly positive data (e.g., waiting times), use log-normal or gamma instead.
Computation and Numerical Techniques
Evaluating CDF
No elementary closed form; use numerical integration, error function approximation, polynomial expansions.
Inverse CDF (Quantile Function)
Essential for simulations, hypothesis testing. Computed via rational approximations or iterative methods.
Random Number Generation
Methods: Box-Muller transform, Marsaglia polar method, ziggurat algorithm. Generate standard normal variates efficiently.
References
- Feller, W. An Introduction to Probability Theory and Its Applications, Vol. 2, Wiley, 1971, pp. 181-214.
- Casella, G., Berger, R. L. Statistical Inference, 2nd ed., Duxbury, 2002, pp. 243-290.
- Billingsley, P. Probability and Measure, 3rd ed., Wiley, 1995, pp. 326-340.
- Johnson, N. L., Kotz, S., Balakrishnan, N. Continuous Univariate Distributions, Vol. 1, Wiley, 1994, pp. 15-45.
- Lehmann, E. L., Romano, J. P. Testing Statistical Hypotheses, 3rd ed., Springer, 2005, pp. 100-120.