Introduction

Radix sort non-comparative sorting algorithm: instead of comparing elements, sorts by processing digits (or characters). Achieves O(nk) time where k = number of digits. For fixed-size keys (k constant), becomes O(n). Linear time sorting!

Key insight: comparison lower bound O(n log n) doesn't apply to radix sort (non-comparative). Exploits structure of keys (composed of digits). Historical importance: punch-card sorting machines (1930s-1960s) used radix sort. Modern relevance: sorting integers in linear time.

Two main approaches: LSD (least significant digit first) processes right-to-left, MSD (most significant digit first) processes left-to-right. LSD stable if stable subroutine used. MSD can partition into independent subproblems (parallelizable).

"Radix sort proves that linear-time sorting possible if constraints relaxed. By not comparing elements, we achieve better asymptotic bound than comparison sorts." -- Sorting algorithm theory

Sorting Digit by Digit

Core Idea

Observation: if numbers sorted by digit d, stable sort by digit d+1 preserves digit d ordering. Recursively sort by all digits, result globally sorted.

Stability Requirement

Each pass must be stable: equal keys maintain relative order. Non-stable sort invalidates algorithm. Counting sort is stable, making it ideal subroutine.

Example: Sorting Integers

Input: [170, 45, 75, 90, 802, 24, 2, 66]
Pass 1 (units): [2, 24, 45, 66, 75, 90, 170, 802] (stable by digit)
Pass 2 (tens): [2, 24, 45, 66, 75, 90, 170, 802]
Pass 3 (hundreds): [2, 24, 45, 66, 75, 90, 170, 802]
Result: Sorted in 3 passes

Counting Sort Subroutine

Algorithm

def counting_sort_by_digit(array, digit_position):
 count = [0] * 10 # Count frequency of each digit 0-9
 for num in array:
 digit = (num // digit_position) % 10
 count[digit] += 1

 # Convert to cumulative counts
 for i in range(1, 10):
 count[i] += count[i-1]

 # Build result array (backward to maintain stability)
 output = [0] * len(array)
 for i in range(len(array) - 1, -1, -1):
 digit = (array[i] // digit_position) % 10
 count[digit] -= 1
 output[count[digit]] = array[i]

 return output

Time: O(n + k) where k = range of digit (10 for decimal)

Stability

Backward iteration through array maintains stability. Elements with same digit processed in original order (FIFO).

Least Significant Digit (LSD) Radix Sort

Algorithm

def radix_sort_lsd(array):
 max_num = max(array)
 digit_position = 1

 while digit_position <= max_num:
 array = counting_sort_by_digit(array, digit_position)
 digit_position *= 10

 return array

Time: O(n*k) where k = number of digits

Characteristics

  • Processes least significant digit first (right to left)
  • Stable if stable subroutine (counting sort)
  • k iterations (number of digits in largest number)
  • Simple, commonly used

Most Significant Digit (MSD) Radix Sort

Algorithm

def radix_sort_msd(array, digit_position):
 if digit_position == 0 or len(array) <= 1:
 return array

 buckets = [[] for _ in range(10)]
 for num in array:
 digit = (num // digit_position) % 10
 buckets[digit].append(num)

 result = []
 for bucket in buckets:
 result.extend(radix_sort_msd(bucket, digit_position // 10))

 return result

Time: O(n*k) but often faster (early termination)

Characteristics

  • Most significant digit first (left to right)
  • Naturally partitions into independent subproblems
  • Parallelizable: sort buckets in parallel
  • More cache-friendly than LSD (fewer passes)
  • Recursive structure

Comparison with LSD

LSD: fixed k passes regardless of data. MSD: can terminate early if rightmost digits identical. MSD enables parallelization. LSD simpler.

Time and Space Complexity Analysis

Time Complexity

Radix sort: O(n*k) where n = number of elements, k = number of digits. For each digit, counting sort O(n + d) where d = digit range (10 for decimal). k iterations: O(k*(n + d))

If k and d constants: O(n). Linear time!

Compared to Comparison Sorts

Algorithm Time Space
QuickSort O(n log n) avg, O(n^2) worst O(log n)
MergeSort O(n log n) guaranteed O(n)
Radix Sort (LSD) O(n*k) O(n)
Radix Sort (MSD) O(n*k) O(n)

Space

Radix sort: O(n) for output array + O(d) for counting. Overall O(n).

Stability and Preservation

Stability Definition

Stable sort: equal elements maintain relative order. Critical for radix sort correctness.

Stable Subroutines

Counting sort stable if iterate backward. MSD radix sort preserves order (bucket processing maintains sequence). LSD requires stable counting sort.

Practical Implications

Radix sort suitable for sorting records by multiple fields (stable preserves previous sort order). Example: sort records by zip code, then by city within each zip.

Practical Implementation Considerations

Digit Extraction

For integers: (num // 10^i) % 10. For strings: each character a digit. For floats: separate integer/fractional parts.

Negative Numbers

Radix sort as stated handles non-negative integers. For negatives: separate into two groups (negatives sorted reverse, positives normal), concatenate.

Cache Behavior

LSD: multiple passes limit cache reuse (array accessed k times, k iterations). MSD: fewer passes but recursive, stack overhead. In practice, depends on k and cache size.

Memory Allocation

Radix sort requires O(n) auxiliary space (output array). For very large arrays, memory bandwidth limited. In-place variants exist but less common.

Radix Sort vs. Quicksort in Practice

Theoretical Advantage

Radix sort O(n*k) vs. QuickSort O(n log n). If k < log n (e.g., 32-bit integers: k=32, n=10^9, log n≈30), radix sort faster asymptotically.

Practical Performance

QuickSort usually faster due to: (1) better cache locality (in-place, fewer passes), (2) smaller constants, (3) implemented optimized in libraries. Radix sort disadvantages: multiple passes, larger space, less cache-friendly.

When to Use Radix Sort

  • Small k (few digits): overwhelms theoretical advantage
  • Very large n: O(n) can beat O(n log n)
  • Stability required: radix sort naturally stable
  • Specialized hardware: GPU implementations favor radix sort

Applications and Variants

Bucket Sort

Generalization: divide range into buckets, sort buckets independently. Radix sort special case of bucket sort with k levels.

Parallel Sorting

MSD radix sort embarrassingly parallel: each bucket sorted independently. GPU implementations highly efficient (data-parallel pattern).

Sorting Strings

Radix sort by character position (like digits). O(n*m) where m = string length. Efficient for many short strings.

Suffix Arrays

Building suffix arrays (all suffixes of string sorted): radix sort used in DC3 algorithm, linear-time suffix array construction.

Counting Occurrences

Radix sort elements while counting: frequency analysis. Histogram of values computed during sort.

References

  • Cormen, T. H., Leiserson, C. E., Rivest, R. L., and Stein, C. "Introduction to Algorithms." MIT Press, 3rd edition, 2009.
  • Knuth, D. E. "The Art of Computer Programming, Volume 3: Sorting and Searching." Addison-Wesley, 2nd edition, 1998.
  • Sedgewick, R., and Wayne, K. "Algorithms, Part I." Addison-Wesley, 4th edition, 2011.
  • Shoup, V. "A Computational Introduction to Number Theory and Algebra." Cambridge University Press, 2nd edition, 2008.
  • Sanders, P., and Schulz, F. "Parallel Distributed Computing." Springer, 2nd edition, 2015.