Definition
Basic Concept
Interquartile Range (IQR): range between first quartile (Q1) and third quartile (Q3). Measures statistical dispersion of middle 50% of data. Represents spread of central data values, excluding extremes.
Quartiles Breakdown
Quartiles: values dividing dataset into four equal parts. Q1: 25th percentile. Median (Q2): 50th percentile. Q3: 75th percentile. IQR = Q3 − Q1.
Robustness
Resistant to outliers and extreme values. Unlike range, IQR focuses on central distribution. Preferred in skewed or non-normal datasets.
Calculation Methods
Step-by-Step Procedure
1. Sort data in ascending order. 2. Find Q1: median of lower half (below median). 3. Find Q3: median of upper half (above median). 4. Compute IQR = Q3 − Q1.
Median Inclusion Variants
Two approaches: include median in halves if odd number of data points (Tukey), or exclude median (Moore and McCabe). Slight differences in Q1 and Q3 values.
Use of Software
Statistical packages (R, Python, SPSS) provide built-in functions. Methods may vary slightly based on interpolation techniques.
| Data Set | Sorted | Q1 | Q3 | IQR |
|---|---|---|---|---|
| 7, 15, 36, 39, 40, 41, 42, 43, 47, 49 | 7, 15, 36, 39, 40, 41, 42, 43, 47, 49 | 36.5 | 43.5 | 7 |
Properties
Range of Values
IQR value ≥ 0. Zero indicates no spread between Q1 and Q3. Larger values indicate greater dispersion.
Scale Invariance
Linear transformations affect IQR proportionally. Multiplying data by constant k scales IQR by |k|.
Robustness to Outliers
Focus on middle 50% excludes influence of extreme values. Makes IQR suitable for skewed distributions.
Interpretation
Measure of Spread
IQR quantifies variability of central data. Reflects typical range where middle half of observations lie.
Skewness Indicator
Relative positions of Q1 and Q3 to median indicate skewness: Q3 − median vs. median − Q1 comparison.
Data Consistency
Smaller IQR implies more consistent data. Larger IQR suggests more variability or heterogeneity.
Applications
Descriptive Statistics
Summarizes spread alongside median. Common in reports, research papers, and exploratory data analysis.
Outlier Detection
Used to define fences for identifying outliers: lower fence = Q1 − 1.5×IQR, upper fence = Q3 + 1.5×IQR.
Boxplot Construction
Determines length of box representing middle 50% data. Visual tool for data distribution and spread.
Comparison with Other Measures
Range vs IQR
Range includes extremes, sensitive to outliers. IQR excludes extremes, more robust.
Standard Deviation vs IQR
Standard deviation assumes normality, sensitive to outliers. IQR requires no distribution assumptions.
Variance vs IQR
Variance quantifies average squared deviation, influenced by extremes. IQR focuses on central spread.
Advantages and Limitations
Advantages
Robustness to outliers. Simple to compute. Intuitive interpretation. Useful for non-normal data.
Limitations
Ignores data outside middle 50%. Less informative for symmetric, normal data. Not suitable for parametric tests.
Mitigation
Combine with other statistics (median, mean, standard deviation) for comprehensive analysis.
Role in Outlier Detection
Fence Methodology
Outliers: observations beyond fences defined by IQR. Lower fence = Q1 − 1.5×IQR, upper fence = Q3 + 1.5×IQR.
Extreme Outliers
Defined using 3×IQR fences. Emphasizes points far beyond typical spread.
Practical Use
Widely applied in data cleaning, quality control, and exploratory data analysis.
| Statistic | Formula | Purpose |
|---|---|---|
| Lower Fence | Q1 − 1.5 × IQR | Detect mild outliers |
| Upper Fence | Q3 + 1.5 × IQR | Detect mild outliers |
| Lower Extreme Fence | Q1 − 3 × IQR | Detect extreme outliers |
| Upper Extreme Fence | Q3 + 3 × IQR | Detect extreme outliers |
Examples
Example 1: Simple Data Set
Data: 1, 2, 3, 4, 5, 6, 7, 8, 9
Sorted: same as data. Q1 = 3, Q3 = 7, IQR = 7 − 3 = 4.
Example 2: Skewed Data
Data: 5, 7, 8, 12, 15, 18, 22, 27, 30, 45
Q1 = 7.5, Q3 = 27, IQR = 19.5. Large IQR indicates spread in middle data despite skewness.
Example 3: Outlier Detection
Data: 10, 12, 15, 15, 16, 18, 22, 23, 24, 100
Q1 = 14, Q3 = 23.5, IQR = 9.5
Lower fence = 14 − 1.5×9.5 = −0.25 (no lower outliers)
Upper fence = 23.5 + 1.5×9.5 = 37.75 (100 > 37.75 ⇒ 100 is outlier)
Visualization Techniques
Boxplot
Box represents IQR; median shown inside box; whiskers extend to fences or data extremes; outliers plotted separately.
Quantile Plot
Displays quartiles as points; visually assesses spread and skewness.
Histogram with Quartiles
Overlay quartile lines on histogram to show distribution concentration.
Formulas and Algorithms
IQR Formula
IQR = Q3 − Q1Where:Q1 = 25th percentileQ3 = 75th percentileOutlier Detection Formula
Lower Fence = Q1 − 1.5 × IQRUpper Fence = Q3 + 1.5 × IQRData point x is an outlier if:x < Lower Fence or x > Upper FenceAlgorithm to Calculate IQR
Input: Data set D with n valuesStep 1: Sort D ascendingStep 2: Compute median (Q2)Step 3: Split D into lower half (below median) and upper half (above median)Step 4: Compute Q1 as median of lower halfStep 5: Compute Q3 as median of upper halfStep 6: Calculate IQR = Q3 − Q1Output: IQR valueReferences
- Wilcox, R.R., "Introduction to Robust Estimation and Hypothesis Testing", Academic Press, 2012, pp. 45-67.
- McGill, R., Tukey, J.W., Larsen, W.A., "Variations of Boxplots", The American Statistician, Vol. 32, 1978, pp. 12-16.
- Hyndman, R.J., Fan, Y., "Sample Quantiles in Statistical Packages", The American Statistician, Vol. 50, 1996, pp. 361-365.
- Hoaglin, D.C., Iglewicz, B., Tukey, J.W., "Performance of Some Resistant Rules for Outlier Labeling", Journal of the American Statistical Association, Vol. 81, 1986, pp. 991-999.
- Hyndman, R.J., "Computing and Graphing Highest Density Regions", The American Statistician, Vol. 50, 1996, pp. 120-126.