Definition of Quartiles
Basic Concept
Quartiles divide ordered data into four equal parts. Each quartile marks a boundary: 25%, 50%, 75% of data points lie below it.
Quartile Values
Q1 (first quartile): 25th percentile. Q2 (second quartile): median, 50th percentile. Q3 (third quartile): 75th percentile.
Purpose
Summarize distribution shape, central tendency, and spread without assumptions of data normality.
Importance and Applications
Data Summarization
Compactly describe data distribution in a few statistics; better than mean and standard deviation for skewed data.
Spread Measurement
Identify variability and dispersion through interquartile range (IQR = Q3 - Q1).
Outlier Identification
Flag extreme values using quartile boundaries and fences.
Data Segmentation
Split data into four segments for comparative analysis or grouping.
Robustness
Less sensitive to extreme values than mean and standard deviation.
Types of Quartiles
First Quartile (Q1)
Value below which 25% of data lie. Often called lower quartile.
Second Quartile (Q2)
Median; 50% of data below this point.
Third Quartile (Q3)
Value below which 75% of data lie; upper quartile.
Additional Quartile-Related Measures
Quartile deviation: half the IQR; used as a variability index.
Calculation Methods
Sorting Data
Step 1: Arrange data in ascending order.
Position Formula
Position of Qk = (k/4)(n+1), where k=1,2,3 and n=sample size.
Interpolation
If position is fractional, interpolate between adjacent data points.
Method Variants
Multiple methods exist: inclusive, exclusive, weighted averages; results may vary slightly.
Group Data
Use cumulative frequency to estimate quartiles for grouped data.
Position of Qk = (k/4) * (n + 1)If fractional, Qk = x_lower + fraction * (x_upper - x_lower)Where:- x_lower = data at floor position- x_upper = data at ceil positionWorked Examples
Example 1: Small Data Set
Data: 3, 7, 8, 5, 12, 14, 21, 13, 18
Sorted: 3, 5, 7, 8, 12, 13, 14, 18, 21
n=9; Q1 pos=(1/4)(10)=2.5 → interpolate between 2nd and 3rd values (5,7): Q1=5 + 0.5*(7-5)=6
Q2 pos= (2/4)(10)=5 → 5th value = 12 (median)
Q3 pos= (3/4)(10)=7.5 → between 7th and 8th values (14,18): Q3=14 + 0.5*(18-14)=16
Example 2: Grouped Data
Class intervals and frequencies:
| Class Interval | Frequency |
|---|---|
| 0-10 | 5 |
| 10-20 | 8 |
| 20-30 | 12 |
| 30-40 | 5 |
Total n=30; calculate cumulative frequency; find Q1 position=7.5, Q2=15, Q3=22.5; interpolate within classes accordingly.
Properties of Quartiles
Order and Range
Q1 ≤ Q2 ≤ Q3; all lie within minimum and maximum data values.
Robustness
Quartiles resist influence by extreme outliers.
Non-Parametric
Do not require assumptions about data distribution shape.
Dependent on Sample Size
Precision improves with larger datasets.
Non-Unique for Even Data
When data count is even, multiple calculation methods may yield slightly different quartiles.
Relation to Percentiles
Quartiles as Percentiles
Q1 = 25th percentile, Q2 = 50th percentile, Q3 = 75th percentile.
Percentile Definition
Value below which a given percentage of data fall.
Generalization
Quartiles are specific percentiles dividing data into four equal parts.
Computation Similarity
Calculation methods for quartiles and percentiles overlap.
Usage Difference
Percentiles provide finer granularity; quartiles summarize with fewer points.
Interquartile Range (IQR)
Definition
IQR = Q3 - Q1; measures middle 50% spread of data.
Significance
Robust measure of variability; less sensitive to outliers than range or standard deviation.
Calculation
IQR = Q3 - Q1Use in Box Plots
IQR forms the box height representing data dispersion.
Application
Used in outlier detection and descriptive statistics reporting.
Outlier Detection Using Quartiles
Outlier Definition
Data points significantly distant from central distribution.
Fences Calculation
Lower fence = Q1 - 1.5 × IQRUpper fence = Q3 + 1.5 × IQROutlier Identification
Values outside fences considered mild outliers; beyond 3 × IQR considered extreme outliers.
Example
If Q1=10, Q3=20, IQR=10, lower fence= -5, upper fence=35; values < -5 or > 35 flagged.
Limitations
Assumes roughly symmetric distribution; skewness affects fences.
Box Plots and Quartiles
Definition
Graphical representation of data distribution using quartiles and median.
Components
Box: Q1 to Q3; line inside box: median; whiskers: min and max within fences; points: outliers.
Interpretation
Shows central tendency, spread, skewness, and outliers visually.
Construction Steps
- Calculate Q1, Q2, Q3.
- Determine fences and whiskers.
- Plot box and whiskers; mark outliers.
Usage
Compare multiple datasets; detect asymmetry and variability.
Limitations of Quartiles
Loss of Detail
Only three values summarize data; ignores distribution nuances.
Ambiguity in Calculation
Multiple methods yield different quartile values; affects interpretation.
Sensitivity to Sample Size
Small samples yield coarse quartile estimates.
Not Suitable for Multimodal Data
Cannot capture multiple peaks or modes effectively.
Limited in Inferential Statistics
Not directly used for hypothesis testing or parameter estimation.
Quartiles in Statistical Software
R
Function: quantile(x, probs = c(0.25, 0.5, 0.75), type = 7 by default).
Python (NumPy, Pandas)
NumPy: numpy.percentile(array, [25, 50, 75]). Pandas: DataFrame.quantile([0.25, 0.5, 0.75]).
SPSS
Descriptives with quartile options; Explore procedure.
Excel
Functions: QUARTILE.INC(range, quart), QUARTILE.EXC(range, quart).
Interpretation Differences
Software uses varying algorithms; results may differ slightly; specify method for consistency.
References
- Moore, D. S., McCabe, G. P., & Craig, B. A., Introduction to the Practice of Statistics, W.H. Freeman, 2017, pp. 85-110.
- Wilcox, R. R., Fundamentals of Modern Statistical Methods: Substantially Improving Power and Accuracy, Springer, 2010, pp. 45-70.
- Hyndman, R. J., & Fan, Y., Sample Quantiles in Statistical Science, Vol. 9, No. 2, 1994, pp. 181-197.
- McGill, R., Tukey, J. W., & Larsen, W. A., Variations of Box Plots, The American Statistician, Vol. 32, No. 1, 1978, pp. 12-16.
- Altman, D. G., Practical Statistics for Medical Research, Chapman & Hall, 1991, pp. 39-60.