Introduction
Sample size determination: critical step in study design. Balances precision, confidence, cost, and feasibility. Ensures interval estimates reflect true population parameters within acceptable error margins.
"Sample size calculation is the cornerstone of reliable statistical inference and valid confidence interval estimation." -- Daniel S. Soper
Definition and Purpose
Concept
Sample size determination: process of deciding number of observations required to achieve desired confidence and precision in interval estimates.
Goal
Minimize sampling error, control margin of error, maintain pre-specified confidence level, optimize resource use.
Application
Used in surveys, experiments, clinical trials, quality control, social sciences, wherever population parameters need estimation.
Key Parameters in Sample Size Determination
Confidence Level (1-α)
Probability that true parameter lies within confidence interval. Commonly 90%, 95%, 99%. Higher confidence requires larger samples.
Margin of Error (E)
Maximum allowable difference between sample estimate and true parameter. Determines precision.
Population Variance (σ²) or Proportion (p)
Measure of variability in population. Greater variability demands larger sample size.
Population Size (N)
Relevant when population finite and small. Adjusts sample size via finite population correction.
Sample Size Formulas for Confidence Intervals
For Population Mean (σ Known)
Basic formula based on normal distribution quantile, variance, and margin of error.
n = (Z_(α/2) * σ / E)²For Population Proportion
Formula incorporates estimated proportion to account for binomial variability.
n = (Z_(α/2))² * p * (1 - p) / E²When σ Unknown (Use Sample Standard Deviation)
Approximate using pilot studies or prior research, or iteratively update.
Effect of Margin of Error
Inverse Square Relationship
Sample size proportional to 1/E². Halving margin of error quadruples sample size.
Practical Impact
Small gains in precision require large sample increases; budget and time constraints critical.
Trade-Offs
Balance between feasible sample size and desired precision essential.
Confidence Level Implications
Critical Value (Z or t)
Higher confidence level increases critical value, enlarges sample size.
Examples
Z_(0.05/2) = 1.96 for 95%, Z_(0.01/2) = 2.576 for 99%.
Effect on Interval Width
Wider intervals for higher confidence, requiring more data points to maintain margin of error.
Population Variability and Its Impact
Variance Influence
Higher variance increases required sample size to achieve same precision.
Estimating Variance
Use previous studies, pilot data, or conservative estimates to avoid underestimation.
Effect on Proportions
Maximum variability at p=0.5 maximizes sample size; conservative default when p unknown.
Finite Population Correction
When Applicable
Useful when sample exceeds 5% of population (n/N > 0.05).
Correction Factor
n_adj = (n * N) / (n + N - 1)Effect
Reduces sample size needed for fixed precision when population is small.
Sample Size for Proportions
Formula Recap
n = (Z_(α/2))² * p * (1-p) / E²Unknown Proportion Strategy
Use p=0.5 for maximum sample size; ensures conservative estimate.
Example Table
| Margin of Error (E) | Sample Size (p=0.5, 95% CI) |
|---|---|
| 0.05 | 385 |
| 0.03 | 1067 |
| 0.01 | 9604 |
Sample Size for Means
Formula Recap
n = (Z_(α/2) * σ / E)²Estimating σ
Use prior studies, pilot samples, or conservative guesses when unknown.
Example Calculation
Given σ=10, E=2, 95% CI (Z=1.96):
n = (1.96 * 10 / 2)² = (9.8)² = 96.04 ≈ 97Practical Considerations in Determining Sample Size
Budget and Resources
Balance statistical requirements with financial and temporal constraints.
Non-Response and Attrition
Adjust sample size upwards to compensate for expected data loss.
Ethical Concerns
Avoid unnecessarily large samples to prevent waste and participant burden.
Software and Tools
Use statistical packages (R, SAS, G*Power) for precise calculations and power analysis.
Common Mistakes and Misconceptions
Ignoring Variance or Using Incorrect Estimates
Leads to underpowered or inefficient studies.
Misunderstanding Margin of Error
Confusing margin of error with total interval width or ignoring confidence level effects.
Disregarding Finite Population Correction
Results in oversampling in small populations.
Assuming Larger Samples Always Better
Excessive sample size can waste resources without meaningful gain.
References
- Cochran, W.G., Sampling Techniques, 3rd ed., Wiley, 1977, pp. 50-65.
- Kish, L., Survey Sampling, Wiley, 1965, pp. 33-45.
- Lwanga, S.K., Lemeshow, S., Sample Size Determination in Health Studies, WHO, 1991, pp. 10-20.
- Israel, G.D., Determining Sample Size, University of Florida IFAS Extension, 1992, pp. 1-5.
- Biau, D.J., Kernéis, S., Porcher, R., Statistics in Brief: The Importance of Sample Size in the Planning and Interpretation of Medical Research, Clinical Orthopaedics and Related Research, 466(9), 2008, pp. 2282-2288.