Definition
Concept
Median: value separating ordered data set into two equal halves. Half observations ≤ median, half ≥ median. Central tendency measure robust to skewness and outliers.
Formal Definition
For dataset X = {x₁, x₂, ..., xₙ} sorted ascending: median m satisfies P(X ≤ m) ≥ 0.5 and P(X ≥ m) ≥ 0.5.
Types of Median
Sample median: computed from sample data. Population median: theoretical median of distribution function F(x) where F(m) = 0.5.
Properties
Order Statistic
Median is the middle order statistic: (n+1)/2-th value for odd n; average of n/2-th and (n/2 +1)-th for even n.
Robustness
Breakdown point ~50%: median unaffected by up to half data corrupted. Resistant to extreme values compared to mean.
Nonlinearity
Median not linear: median(aX + b) = a median(X) + b only for positive a. No additive property unlike mean.
Uniqueness
Median unique for continuous distributions; may be non-unique for discrete or multimodal data.
Calculation Methods
Unordered Data
Step 1: Sort data ascending. Step 2: Identify middle position(s). Step 3: Extract median value or average two middle values.
Even Number of Observations
Median = average of n/2-th and (n/2 +1)-th ordered values.
Odd Number of Observations
Median = value at position (n+1)/2.
Grouped Data
Use formula with cumulative frequency to estimate median class and interpolate within class interval.
Median = L + [(N/2 - F) / f] × hWhere:L = lower boundary of median classN = total frequencyF = cumulative frequency before median classf = frequency of median classh = class widthExamples
Odd Number Dataset
Dataset: {3, 1, 4, 5, 2}. Sorted: {1, 2, 3, 4, 5}. Median = 3 (3rd element).
Even Number Dataset
Dataset: {7, 3, 9, 5}. Sorted: {3, 5, 7, 9}. Median = (5 + 7)/2 = 6.
Grouped Data Example
Classes: 10-20 (5), 20-30 (8), 30-40 (12), 40-50 (5). Total N=30.
| Class Interval | Frequency (f) | Cumulative Frequency (F) |
|---|---|---|
| 10-20 | 5 | 5 |
| 20-30 | 8 | 13 |
| 30-40 | 12 | 25 |
| 40-50 | 5 | 30 |
Find median class: N/2=15, class 30-40 since cumulative frequency before is 13.
Apply formula:
Median = 30 + [(15 - 13)/12] × 10 = 30 + (2/12)*10 = 30 + 1.67 = 31.67Median vs Mean
Definition
Median: middle value in ordered data. Mean: arithmetic average of all values.
Sensitivity to Outliers
Median robust to outliers; mean influenced by extremes significantly.
Skewness Impact
In skewed distributions, median better represents central tendency; mean shifts towards tail.
Mathematical Properties
Mean is linear, differentiable; median is non-linear, less mathematically tractable but more robust.
Applications
Descriptive Statistics
Summarizes data central location when distribution is skewed or contains outliers.
Income and Wealth Analysis
Median income better indicator of typical earnings than mean.
Medical Statistics
Median survival time in clinical trials; reduces bias from extreme survival times.
Real Estate
Median home prices reported to avoid influence of very high or low priced sales.
Quality Control
Median used to monitor process central tendency when data is non-normal.
Advantages and Disadvantages
Advantages
- Robust to outliers and skewed data.
- Simple to compute from ordered data.
- Applicable to ordinal data.
- High breakdown point (50%).
Disadvantages
- Ignores magnitude of deviations outside median.
- Not suitable for mathematical operations requiring linearity.
- Less efficient estimator than mean under symmetric, normal distributions.
- Non-unique in some discrete data sets.
Median in Grouped Data
Concept
Grouped data: frequencies in class intervals. Median estimated by interpolation within median class.
Formula Derivation
Uses cumulative frequency to locate median class; linear interpolation assumes uniform distribution within class.
Limitations
Assumption of uniform distribution may be invalid; approximation only.
Practical Use
Common in large surveys, census data, where individual values unavailable.
Median in Probability Distributions
Definition
Median m satisfies cumulative distribution function (CDF) F(m) = 0.5.
Continuous Distributions
Median found by solving integral equations or inverse CDF at 0.5.
Discrete Distributions
Median may be any value where CDF reaches or exceeds 0.5.
Examples
- Normal distribution median = mean = mode.
- Exponential distribution median = (ln 2)/λ.
Robustness and Outliers
Breakdown Point
Median tolerates up to 50% arbitrary contamination without breakdown.
Resistance to Extreme Values
Extreme values have no influence unless they cross median position in ordered data.
Comparison to Mean
Mean breakdown point is 0%; median superior in presence of anomalies.
Use in Robust Statistics
Median foundational estimator in robust location estimation methods.
Computation Algorithms
Sorting-Based Method
Sort data: O(n log n) time complexity. Extract middle value(s).
Selection Algorithms
Median of medians algorithm finds median in linear time O(n) without full sort.
Streaming Data
Use data structures like two heaps (max-heap, min-heap) to maintain median dynamically.
Software Implementations
Standard libraries in R, Python (numpy.median), MATLAB provide median functions optimized for performance.
// Pseudocode: Median of Medians Algorithmfunction select(arr, k): if length(arr) <= 5: return sorted(arr)[k] medians = [] for each group of 5 elements in arr: medians.append(median(group)) pivot = select(medians, length(medians)//2) lows = [x for x in arr if x < pivot] highs = [x for x in arr if x > pivot] pivots = [x for x in arr if x == pivot] if k < length(lows): return select(lows, k) elif k < length(lows) + length(pivots): return pivot else: return select(highs, k - length(lows) - length(pivots))Statistical Inference with Median
Confidence Intervals
Nonparametric methods: order statistics based intervals; bootstrap methods for median CI estimation.
Hypothesis Testing
Median tests (sign test, Wilcoxon signed-rank) assess median location hypotheses.
Median Regression
Quantile regression estimates conditional medians, robust to heteroscedasticity and outliers.
Asymptotic Properties
Sample median is consistent, asymptotically normal with variance dependent on density at median.
References
- Hampel, F.R., Ronchetti, E.M., Rousseeuw, P.J., Stahel, W.A., Robust Statistics: The Approach Based on Influence Functions, Wiley, 1986, pp. 45-55.
- Wilcox, R.R., Introduction to Robust Estimation and Hypothesis Testing, Academic Press, 2012, pp. 23-40.
- Casella, G., Berger, R.L., Statistical Inference, 2nd ed., Duxbury, 2002, pp. 432-435.
- Hogg, R.V., McKean, J., Craig, A.T., Introduction to Mathematical Statistics, 7th ed., Pearson, 2013, pp. 78-84.
- Dodge, Y., The Oxford Dictionary of Statistical Terms, OUP, 2003, pp. 220-222.