Introduction
Radix sort non-comparative sorting algorithm: instead of comparing elements, sorts by processing digits (or characters). Achieves O(nk) time where k = number of digits. For fixed-size keys (k constant), becomes O(n). Linear time sorting!
Key insight: comparison lower bound O(n log n) doesn't apply to radix sort (non-comparative). Exploits structure of keys (composed of digits). Historical importance: punch-card sorting machines (1930s-1960s) used radix sort. Modern relevance: sorting integers in linear time.
Two main approaches: LSD (least significant digit first) processes right-to-left, MSD (most significant digit first) processes left-to-right. LSD stable if stable subroutine used. MSD can partition into independent subproblems (parallelizable).
"Radix sort proves that linear-time sorting possible if constraints relaxed. By not comparing elements, we achieve better asymptotic bound than comparison sorts." -- Sorting algorithm theory
Sorting Digit by Digit
Core Idea
Observation: if numbers sorted by digit d, stable sort by digit d+1 preserves digit d ordering. Recursively sort by all digits, result globally sorted.
Stability Requirement
Each pass must be stable: equal keys maintain relative order. Non-stable sort invalidates algorithm. Counting sort is stable, making it ideal subroutine.
Example: Sorting Integers
Input: [170, 45, 75, 90, 802, 24, 2, 66]
Pass 1 (units): [2, 24, 45, 66, 75, 90, 170, 802] (stable by digit)
Pass 2 (tens): [2, 24, 45, 66, 75, 90, 170, 802]
Pass 3 (hundreds): [2, 24, 45, 66, 75, 90, 170, 802]
Result: Sorted in 3 passes
Counting Sort Subroutine
Algorithm
def counting_sort_by_digit(array, digit_position):
count = [0] * 10 # Count frequency of each digit 0-9
for num in array:
digit = (num // digit_position) % 10
count[digit] += 1
# Convert to cumulative counts
for i in range(1, 10):
count[i] += count[i-1]
# Build result array (backward to maintain stability)
output = [0] * len(array)
for i in range(len(array) - 1, -1, -1):
digit = (array[i] // digit_position) % 10
count[digit] -= 1
output[count[digit]] = array[i]
return output
Time: O(n + k) where k = range of digit (10 for decimal)
Stability
Backward iteration through array maintains stability. Elements with same digit processed in original order (FIFO).
Least Significant Digit (LSD) Radix Sort
Algorithm
def radix_sort_lsd(array):
max_num = max(array)
digit_position = 1
while digit_position <= max_num:
array = counting_sort_by_digit(array, digit_position)
digit_position *= 10
return array
Time: O(n*k) where k = number of digits
Characteristics
- Processes least significant digit first (right to left)
- Stable if stable subroutine (counting sort)
- k iterations (number of digits in largest number)
- Simple, commonly used
Most Significant Digit (MSD) Radix Sort
Algorithm
def radix_sort_msd(array, digit_position):
if digit_position == 0 or len(array) <= 1:
return array
buckets = [[] for _ in range(10)]
for num in array:
digit = (num // digit_position) % 10
buckets[digit].append(num)
result = []
for bucket in buckets:
result.extend(radix_sort_msd(bucket, digit_position // 10))
return result
Time: O(n*k) but often faster (early termination)
Characteristics
- Most significant digit first (left to right)
- Naturally partitions into independent subproblems
- Parallelizable: sort buckets in parallel
- More cache-friendly than LSD (fewer passes)
- Recursive structure
Comparison with LSD
LSD: fixed k passes regardless of data. MSD: can terminate early if rightmost digits identical. MSD enables parallelization. LSD simpler.
Time and Space Complexity Analysis
Time Complexity
Radix sort: O(n*k) where n = number of elements, k = number of digits. For each digit, counting sort O(n + d) where d = digit range (10 for decimal). k iterations: O(k*(n + d))
If k and d constants: O(n). Linear time!
Compared to Comparison Sorts
| Algorithm | Time | Space |
|---|---|---|
| QuickSort | O(n log n) avg, O(n^2) worst | O(log n) |
| MergeSort | O(n log n) guaranteed | O(n) |
| Radix Sort (LSD) | O(n*k) | O(n) |
| Radix Sort (MSD) | O(n*k) | O(n) |
Space
Radix sort: O(n) for output array + O(d) for counting. Overall O(n).
Stability and Preservation
Stability Definition
Stable sort: equal elements maintain relative order. Critical for radix sort correctness.
Stable Subroutines
Counting sort stable if iterate backward. MSD radix sort preserves order (bucket processing maintains sequence). LSD requires stable counting sort.
Practical Implications
Radix sort suitable for sorting records by multiple fields (stable preserves previous sort order). Example: sort records by zip code, then by city within each zip.
Practical Implementation Considerations
Digit Extraction
For integers: (num // 10^i) % 10. For strings: each character a digit. For floats: separate integer/fractional parts.
Negative Numbers
Radix sort as stated handles non-negative integers. For negatives: separate into two groups (negatives sorted reverse, positives normal), concatenate.
Cache Behavior
LSD: multiple passes limit cache reuse (array accessed k times, k iterations). MSD: fewer passes but recursive, stack overhead. In practice, depends on k and cache size.
Memory Allocation
Radix sort requires O(n) auxiliary space (output array). For very large arrays, memory bandwidth limited. In-place variants exist but less common.
Radix Sort vs. Quicksort in Practice
Theoretical Advantage
Radix sort O(n*k) vs. QuickSort O(n log n). If k < log n (e.g., 32-bit integers: k=32, n=10^9, log n≈30), radix sort faster asymptotically.
Practical Performance
QuickSort usually faster due to: (1) better cache locality (in-place, fewer passes), (2) smaller constants, (3) implemented optimized in libraries. Radix sort disadvantages: multiple passes, larger space, less cache-friendly.
When to Use Radix Sort
- Small k (few digits): overwhelms theoretical advantage
- Very large n: O(n) can beat O(n log n)
- Stability required: radix sort naturally stable
- Specialized hardware: GPU implementations favor radix sort
Applications and Variants
Bucket Sort
Generalization: divide range into buckets, sort buckets independently. Radix sort special case of bucket sort with k levels.
Parallel Sorting
MSD radix sort embarrassingly parallel: each bucket sorted independently. GPU implementations highly efficient (data-parallel pattern).
Sorting Strings
Radix sort by character position (like digits). O(n*m) where m = string length. Efficient for many short strings.
Suffix Arrays
Building suffix arrays (all suffixes of string sorted): radix sort used in DC3 algorithm, linear-time suffix array construction.
Counting Occurrences
Radix sort elements while counting: frequency analysis. Histogram of values computed during sort.
References
- Cormen, T. H., Leiserson, C. E., Rivest, R. L., and Stein, C. "Introduction to Algorithms." MIT Press, 3rd edition, 2009.
- Knuth, D. E. "The Art of Computer Programming, Volume 3: Sorting and Searching." Addison-Wesley, 2nd edition, 1998.
- Sedgewick, R., and Wayne, K. "Algorithms, Part I." Addison-Wesley, 4th edition, 2011.
- Shoup, V. "A Computational Introduction to Number Theory and Algebra." Cambridge University Press, 2nd edition, 2008.
- Sanders, P., and Schulz, F. "Parallel Distributed Computing." Springer, 2nd edition, 2015.