Definition

Concept

Median: value separating ordered data set into two equal halves. Half observations ≤ median, half ≥ median. Central tendency measure robust to skewness and outliers.

Formal Definition

For dataset X = {x₁, x₂, ..., xₙ} sorted ascending: median m satisfies P(X ≤ m) ≥ 0.5 and P(X ≥ m) ≥ 0.5.

Types of Median

Sample median: computed from sample data. Population median: theoretical median of distribution function F(x) where F(m) = 0.5.

Properties

Order Statistic

Median is the middle order statistic: (n+1)/2-th value for odd n; average of n/2-th and (n/2 +1)-th for even n.

Robustness

Breakdown point ~50%: median unaffected by up to half data corrupted. Resistant to extreme values compared to mean.

Nonlinearity

Median not linear: median(aX + b) = a median(X) + b only for positive a. No additive property unlike mean.

Uniqueness

Median unique for continuous distributions; may be non-unique for discrete or multimodal data.

Calculation Methods

Unordered Data

Step 1: Sort data ascending. Step 2: Identify middle position(s). Step 3: Extract median value or average two middle values.

Even Number of Observations

Median = average of n/2-th and (n/2 +1)-th ordered values.

Odd Number of Observations

Median = value at position (n+1)/2.

Grouped Data

Use formula with cumulative frequency to estimate median class and interpolate within class interval.

Median = L + [(N/2 - F) / f] × hWhere:L = lower boundary of median classN = total frequencyF = cumulative frequency before median classf = frequency of median classh = class width

Examples

Odd Number Dataset

Dataset: {3, 1, 4, 5, 2}. Sorted: {1, 2, 3, 4, 5}. Median = 3 (3rd element).

Even Number Dataset

Dataset: {7, 3, 9, 5}. Sorted: {3, 5, 7, 9}. Median = (5 + 7)/2 = 6.

Grouped Data Example

Classes: 10-20 (5), 20-30 (8), 30-40 (12), 40-50 (5). Total N=30.

Class IntervalFrequency (f)Cumulative Frequency (F)
10-2055
20-30813
30-401225
40-50530

Find median class: N/2=15, class 30-40 since cumulative frequency before is 13.

Apply formula:

Median = 30 + [(15 - 13)/12] × 10 = 30 + (2/12)*10 = 30 + 1.67 = 31.67

Median vs Mean

Definition

Median: middle value in ordered data. Mean: arithmetic average of all values.

Sensitivity to Outliers

Median robust to outliers; mean influenced by extremes significantly.

Skewness Impact

In skewed distributions, median better represents central tendency; mean shifts towards tail.

Mathematical Properties

Mean is linear, differentiable; median is non-linear, less mathematically tractable but more robust.

Applications

Descriptive Statistics

Summarizes data central location when distribution is skewed or contains outliers.

Income and Wealth Analysis

Median income better indicator of typical earnings than mean.

Medical Statistics

Median survival time in clinical trials; reduces bias from extreme survival times.

Real Estate

Median home prices reported to avoid influence of very high or low priced sales.

Quality Control

Median used to monitor process central tendency when data is non-normal.

Advantages and Disadvantages

Advantages

  • Robust to outliers and skewed data.
  • Simple to compute from ordered data.
  • Applicable to ordinal data.
  • High breakdown point (50%).

Disadvantages

  • Ignores magnitude of deviations outside median.
  • Not suitable for mathematical operations requiring linearity.
  • Less efficient estimator than mean under symmetric, normal distributions.
  • Non-unique in some discrete data sets.

Median in Grouped Data

Concept

Grouped data: frequencies in class intervals. Median estimated by interpolation within median class.

Formula Derivation

Uses cumulative frequency to locate median class; linear interpolation assumes uniform distribution within class.

Limitations

Assumption of uniform distribution may be invalid; approximation only.

Practical Use

Common in large surveys, census data, where individual values unavailable.

Median in Probability Distributions

Definition

Median m satisfies cumulative distribution function (CDF) F(m) = 0.5.

Continuous Distributions

Median found by solving integral equations or inverse CDF at 0.5.

Discrete Distributions

Median may be any value where CDF reaches or exceeds 0.5.

Examples

  • Normal distribution median = mean = mode.
  • Exponential distribution median = (ln 2)/λ.

Robustness and Outliers

Breakdown Point

Median tolerates up to 50% arbitrary contamination without breakdown.

Resistance to Extreme Values

Extreme values have no influence unless they cross median position in ordered data.

Comparison to Mean

Mean breakdown point is 0%; median superior in presence of anomalies.

Use in Robust Statistics

Median foundational estimator in robust location estimation methods.

Computation Algorithms

Sorting-Based Method

Sort data: O(n log n) time complexity. Extract middle value(s).

Selection Algorithms

Median of medians algorithm finds median in linear time O(n) without full sort.

Streaming Data

Use data structures like two heaps (max-heap, min-heap) to maintain median dynamically.

Software Implementations

Standard libraries in R, Python (numpy.median), MATLAB provide median functions optimized for performance.

// Pseudocode: Median of Medians Algorithmfunction select(arr, k): if length(arr) <= 5: return sorted(arr)[k] medians = [] for each group of 5 elements in arr: medians.append(median(group)) pivot = select(medians, length(medians)//2) lows = [x for x in arr if x < pivot] highs = [x for x in arr if x > pivot] pivots = [x for x in arr if x == pivot] if k < length(lows): return select(lows, k) elif k < length(lows) + length(pivots): return pivot else: return select(highs, k - length(lows) - length(pivots))

Statistical Inference with Median

Confidence Intervals

Nonparametric methods: order statistics based intervals; bootstrap methods for median CI estimation.

Hypothesis Testing

Median tests (sign test, Wilcoxon signed-rank) assess median location hypotheses.

Median Regression

Quantile regression estimates conditional medians, robust to heteroscedasticity and outliers.

Asymptotic Properties

Sample median is consistent, asymptotically normal with variance dependent on density at median.

References

  • Hampel, F.R., Ronchetti, E.M., Rousseeuw, P.J., Stahel, W.A., Robust Statistics: The Approach Based on Influence Functions, Wiley, 1986, pp. 45-55.
  • Wilcox, R.R., Introduction to Robust Estimation and Hypothesis Testing, Academic Press, 2012, pp. 23-40.
  • Casella, G., Berger, R.L., Statistical Inference, 2nd ed., Duxbury, 2002, pp. 432-435.
  • Hogg, R.V., McKean, J., Craig, A.T., Introduction to Mathematical Statistics, 7th ed., Pearson, 2013, pp. 78-84.
  • Dodge, Y., The Oxford Dictionary of Statistical Terms, OUP, 2003, pp. 220-222.