Definition of Quartiles

Basic Concept

Quartiles divide ordered data into four equal parts. Each quartile marks a boundary: 25%, 50%, 75% of data points lie below it.

Quartile Values

Q1 (first quartile): 25th percentile. Q2 (second quartile): median, 50th percentile. Q3 (third quartile): 75th percentile.

Purpose

Summarize distribution shape, central tendency, and spread without assumptions of data normality.

Importance and Applications

Data Summarization

Compactly describe data distribution in a few statistics; better than mean and standard deviation for skewed data.

Spread Measurement

Identify variability and dispersion through interquartile range (IQR = Q3 - Q1).

Outlier Identification

Flag extreme values using quartile boundaries and fences.

Data Segmentation

Split data into four segments for comparative analysis or grouping.

Robustness

Less sensitive to extreme values than mean and standard deviation.

Types of Quartiles

First Quartile (Q1)

Value below which 25% of data lie. Often called lower quartile.

Second Quartile (Q2)

Median; 50% of data below this point.

Third Quartile (Q3)

Value below which 75% of data lie; upper quartile.

Additional Quartile-Related Measures

Quartile deviation: half the IQR; used as a variability index.

Calculation Methods

Sorting Data

Step 1: Arrange data in ascending order.

Position Formula

Position of Qk = (k/4)(n+1), where k=1,2,3 and n=sample size.

Interpolation

If position is fractional, interpolate between adjacent data points.

Method Variants

Multiple methods exist: inclusive, exclusive, weighted averages; results may vary slightly.

Group Data

Use cumulative frequency to estimate quartiles for grouped data.

Position of Qk = (k/4) * (n + 1)If fractional, Qk = x_lower + fraction * (x_upper - x_lower)Where:- x_lower = data at floor position- x_upper = data at ceil position

Worked Examples

Example 1: Small Data Set

Data: 3, 7, 8, 5, 12, 14, 21, 13, 18

Sorted: 3, 5, 7, 8, 12, 13, 14, 18, 21

n=9; Q1 pos=(1/4)(10)=2.5 → interpolate between 2nd and 3rd values (5,7): Q1=5 + 0.5*(7-5)=6

Q2 pos= (2/4)(10)=5 → 5th value = 12 (median)

Q3 pos= (3/4)(10)=7.5 → between 7th and 8th values (14,18): Q3=14 + 0.5*(18-14)=16

Example 2: Grouped Data

Class intervals and frequencies:

Class IntervalFrequency
0-105
10-208
20-3012
30-405

Total n=30; calculate cumulative frequency; find Q1 position=7.5, Q2=15, Q3=22.5; interpolate within classes accordingly.

Properties of Quartiles

Order and Range

Q1 ≤ Q2 ≤ Q3; all lie within minimum and maximum data values.

Robustness

Quartiles resist influence by extreme outliers.

Non-Parametric

Do not require assumptions about data distribution shape.

Dependent on Sample Size

Precision improves with larger datasets.

Non-Unique for Even Data

When data count is even, multiple calculation methods may yield slightly different quartiles.

Relation to Percentiles

Quartiles as Percentiles

Q1 = 25th percentile, Q2 = 50th percentile, Q3 = 75th percentile.

Percentile Definition

Value below which a given percentage of data fall.

Generalization

Quartiles are specific percentiles dividing data into four equal parts.

Computation Similarity

Calculation methods for quartiles and percentiles overlap.

Usage Difference

Percentiles provide finer granularity; quartiles summarize with fewer points.

Interquartile Range (IQR)

Definition

IQR = Q3 - Q1; measures middle 50% spread of data.

Significance

Robust measure of variability; less sensitive to outliers than range or standard deviation.

Calculation

IQR = Q3 - Q1

Use in Box Plots

IQR forms the box height representing data dispersion.

Application

Used in outlier detection and descriptive statistics reporting.

Outlier Detection Using Quartiles

Outlier Definition

Data points significantly distant from central distribution.

Fences Calculation

Lower fence = Q1 - 1.5 × IQRUpper fence = Q3 + 1.5 × IQR

Outlier Identification

Values outside fences considered mild outliers; beyond 3 × IQR considered extreme outliers.

Example

If Q1=10, Q3=20, IQR=10, lower fence= -5, upper fence=35; values < -5 or > 35 flagged.

Limitations

Assumes roughly symmetric distribution; skewness affects fences.

Box Plots and Quartiles

Definition

Graphical representation of data distribution using quartiles and median.

Components

Box: Q1 to Q3; line inside box: median; whiskers: min and max within fences; points: outliers.

Interpretation

Shows central tendency, spread, skewness, and outliers visually.

Construction Steps

  1. Calculate Q1, Q2, Q3.
  2. Determine fences and whiskers.
  3. Plot box and whiskers; mark outliers.

Usage

Compare multiple datasets; detect asymmetry and variability.

Limitations of Quartiles

Loss of Detail

Only three values summarize data; ignores distribution nuances.

Ambiguity in Calculation

Multiple methods yield different quartile values; affects interpretation.

Sensitivity to Sample Size

Small samples yield coarse quartile estimates.

Not Suitable for Multimodal Data

Cannot capture multiple peaks or modes effectively.

Limited in Inferential Statistics

Not directly used for hypothesis testing or parameter estimation.

Quartiles in Statistical Software

R

Function: quantile(x, probs = c(0.25, 0.5, 0.75), type = 7 by default).

Python (NumPy, Pandas)

NumPy: numpy.percentile(array, [25, 50, 75]). Pandas: DataFrame.quantile([0.25, 0.5, 0.75]).

SPSS

Descriptives with quartile options; Explore procedure.

Excel

Functions: QUARTILE.INC(range, quart), QUARTILE.EXC(range, quart).

Interpretation Differences

Software uses varying algorithms; results may differ slightly; specify method for consistency.

References

  • Moore, D. S., McCabe, G. P., & Craig, B. A., Introduction to the Practice of Statistics, W.H. Freeman, 2017, pp. 85-110.
  • Wilcox, R. R., Fundamentals of Modern Statistical Methods: Substantially Improving Power and Accuracy, Springer, 2010, pp. 45-70.
  • Hyndman, R. J., & Fan, Y., Sample Quantiles in Statistical Science, Vol. 9, No. 2, 1994, pp. 181-197.
  • McGill, R., Tukey, J. W., & Larsen, W. A., Variations of Box Plots, The American Statistician, Vol. 32, No. 1, 1978, pp. 12-16.
  • Altman, D. G., Practical Statistics for Medical Research, Chapman & Hall, 1991, pp. 39-60.