Introduction

Sample size determination: critical step in study design. Balances precision, confidence, cost, and feasibility. Ensures interval estimates reflect true population parameters within acceptable error margins.

"Sample size calculation is the cornerstone of reliable statistical inference and valid confidence interval estimation." -- Daniel S. Soper

Definition and Purpose

Concept

Sample size determination: process of deciding number of observations required to achieve desired confidence and precision in interval estimates.

Goal

Minimize sampling error, control margin of error, maintain pre-specified confidence level, optimize resource use.

Application

Used in surveys, experiments, clinical trials, quality control, social sciences, wherever population parameters need estimation.

Key Parameters in Sample Size Determination

Confidence Level (1-α)

Probability that true parameter lies within confidence interval. Commonly 90%, 95%, 99%. Higher confidence requires larger samples.

Margin of Error (E)

Maximum allowable difference between sample estimate and true parameter. Determines precision.

Population Variance (σ²) or Proportion (p)

Measure of variability in population. Greater variability demands larger sample size.

Population Size (N)

Relevant when population finite and small. Adjusts sample size via finite population correction.

Sample Size Formulas for Confidence Intervals

For Population Mean (σ Known)

Basic formula based on normal distribution quantile, variance, and margin of error.

n = (Z_(α/2) * σ / E)²

For Population Proportion

Formula incorporates estimated proportion to account for binomial variability.

n = (Z_(α/2))² * p * (1 - p) / E²

When σ Unknown (Use Sample Standard Deviation)

Approximate using pilot studies or prior research, or iteratively update.

Effect of Margin of Error

Inverse Square Relationship

Sample size proportional to 1/E². Halving margin of error quadruples sample size.

Practical Impact

Small gains in precision require large sample increases; budget and time constraints critical.

Trade-Offs

Balance between feasible sample size and desired precision essential.

Confidence Level Implications

Critical Value (Z or t)

Higher confidence level increases critical value, enlarges sample size.

Examples

Z_(0.05/2) = 1.96 for 95%, Z_(0.01/2) = 2.576 for 99%.

Effect on Interval Width

Wider intervals for higher confidence, requiring more data points to maintain margin of error.

Population Variability and Its Impact

Variance Influence

Higher variance increases required sample size to achieve same precision.

Estimating Variance

Use previous studies, pilot data, or conservative estimates to avoid underestimation.

Effect on Proportions

Maximum variability at p=0.5 maximizes sample size; conservative default when p unknown.

Finite Population Correction

When Applicable

Useful when sample exceeds 5% of population (n/N > 0.05).

Correction Factor

n_adj = (n * N) / (n + N - 1)

Effect

Reduces sample size needed for fixed precision when population is small.

Sample Size for Proportions

Formula Recap

n = (Z_(α/2))² * p * (1-p) / E²

Unknown Proportion Strategy

Use p=0.5 for maximum sample size; ensures conservative estimate.

Example Table

Margin of Error (E)Sample Size (p=0.5, 95% CI)
0.05385
0.031067
0.019604

Sample Size for Means

Formula Recap

n = (Z_(α/2) * σ / E)²

Estimating σ

Use prior studies, pilot samples, or conservative guesses when unknown.

Example Calculation

Given σ=10, E=2, 95% CI (Z=1.96):

n = (1.96 * 10 / 2)² = (9.8)² = 96.04 ≈ 97

Practical Considerations in Determining Sample Size

Budget and Resources

Balance statistical requirements with financial and temporal constraints.

Non-Response and Attrition

Adjust sample size upwards to compensate for expected data loss.

Ethical Concerns

Avoid unnecessarily large samples to prevent waste and participant burden.

Software and Tools

Use statistical packages (R, SAS, G*Power) for precise calculations and power analysis.

Common Mistakes and Misconceptions

Ignoring Variance or Using Incorrect Estimates

Leads to underpowered or inefficient studies.

Misunderstanding Margin of Error

Confusing margin of error with total interval width or ignoring confidence level effects.

Disregarding Finite Population Correction

Results in oversampling in small populations.

Assuming Larger Samples Always Better

Excessive sample size can waste resources without meaningful gain.

References

  • Cochran, W.G., Sampling Techniques, 3rd ed., Wiley, 1977, pp. 50-65.
  • Kish, L., Survey Sampling, Wiley, 1965, pp. 33-45.
  • Lwanga, S.K., Lemeshow, S., Sample Size Determination in Health Studies, WHO, 1991, pp. 10-20.
  • Israel, G.D., Determining Sample Size, University of Florida IFAS Extension, 1992, pp. 1-5.
  • Biau, D.J., Kernéis, S., Porcher, R., Statistics in Brief: The Importance of Sample Size in the Planning and Interpretation of Medical Research, Clinical Orthopaedics and Related Research, 466(9), 2008, pp. 2282-2288.