Introduction
Binary search is one of the most fundamental algorithms in computer science. Given a sorted array and a target value, it determines whether the target exists in the array (and its position) in O(log n) time. The algorithm works by repeatedly dividing the search interval in half, discarding the half that cannot contain the target.
While binary search is often presented as a simple iterative or recursive procedure, it is at its core a divide and conquer algorithm. It divides the problem (searching an array of size n) into a single subproblem of half the size, with O(1) work at each level to determine which half to keep. This perspective connects it to a broader family of D&C algorithms and allows its complexity to be analyzed using the Master theorem.
Despite its apparent simplicity, binary search is notoriously difficult to implement correctly. Jon Bentley reported that only about 10% of professional programmers could write a correct binary search implementation when given two hours. The most common bugs involve off-by-one errors in boundary conditions and integer overflow in computing the midpoint.
Binary Search as Divide and Conquer
The three phases of divide and conquer in binary search are:
- Divide: Compute the midpoint of the current search range, splitting the array into two halves.
- Conquer: Compare the target with the middle element. Recursively search the left half if the target is smaller, or the right half if the target is larger.
- Combine: No combination step is needed. The answer is either the midpoint itself (if it matches the target) or the result from the recursive call.
Binary search is a degenerate case of divide and conquer where only one of the two subproblems needs to be solved. This makes it more efficient than algorithms like merge sort, which must solve both subproblems.
Algorithm Description
Recursive Pseudocode
procedure BinarySearch(arr, target, low, high): if low > high: return NOT_FOUND mid = low + (high - low) / 2 // avoids overflow if arr[mid] == target: return mid else if arr[mid] > target: return BinarySearch(arr, target, low, mid - 1) else: return BinarySearch(arr, target, mid + 1, high)Iterative Pseudocode
procedure BinarySearchIterative(arr, target): low = 0 high = length(arr) - 1 while low <= high: mid = low + (high - low) / 2 if arr[mid] == target: return mid else if arr[mid] > target: high = mid - 1 else: low = mid + 1 return NOT_FOUNDMaster Theorem Application
The recurrence relation for binary search is:
T(n) = T(n/2) + O(1)Applying the Master theorem with a = 1, b = 2, and f(n) = O(1):
- We compute log_b(a) = log_2(1) = 0
- f(n) = O(1) = O(n^0), which matches n^(log_b(a))
- This falls into Case 2 of the Master theorem
- Therefore T(n) = O(n^0 * log n) = O(log n)
| Parameter | Value | Meaning |
|---|---|---|
| a (subproblems) | 1 | Only one half is searched |
| b (size reduction) | 2 | Problem size halves each step |
| f(n) (work per level) | O(1) | One comparison per level |
| Result | O(log n) | Master theorem Case 2 |
Worked Example
Search for target = 23 in the sorted array [2, 5, 8, 12, 16, 23, 38, 56, 72, 91].
Step 1: low=0, high=9. mid = 0 + (9-0)/2 = 4. arr[4]=16. 16 < 23, so search right half: low=5.
Step 2: low=5, high=9. mid = 5 + (9-5)/2 = 7. arr[7]=56. 56 > 23, so search left half: high=6.
Step 3: low=5, high=6. mid = 5 + (6-5)/2 = 5. arr[5]=23. Match found at index 5.
The search completed in 3 comparisons for an array of 10 elements. In general, binary search requires at most ceil(log2(n+1)) comparisons.
For an array of 1 billion elements, binary search needs at most 30 comparisons. A linear search would need up to 1 billion comparisons -- a factor of over 33 million improvement.
Complexity Analysis
| Case | Time | Comparisons |
|---|---|---|
| Best case | O(1) | 1 (target is at midpoint) |
| Average case | O(log n) | ~log2(n) - 1 |
| Worst case | O(log n) | ceil(log2(n+1)) |
Space complexity: O(1) for the iterative version, O(log n) for the recursive version due to the call stack. In practice, the iterative version is preferred to avoid stack overhead.
Binary search is optimal among comparison-based search algorithms. The information-theoretic lower bound for searching a sorted array is ceil(log2(n+1)) comparisons, which binary search achieves.
Variants and Extensions
Lower Bound (First Occurrence)
Find the first index i such that arr[i] >= target. This variant is used for insertion points and is the basis of functions like C++'s lower_bound and Python's bisect_left.
Upper Bound (Last Occurrence)
Find the first index i such that arr[i] > target. The range [lower_bound, upper_bound) gives all positions containing the target value.
Binary Search on Answer
When the answer space is monotonic (if x is feasible then x-1 is also feasible), binary search can find the optimal answer by searching over the answer space rather than an explicit array. This technique is widely used in competitive programming.
Exponential Search
When the array is unbounded or extremely large, exponential search first finds a range by doubling the search boundary, then applies binary search within that range. This achieves O(log k) time where k is the position of the target.
Implementation
def binary_search(arr, target): low, high = 0, len(arr) - 1 while low <= high: mid = low + (high - low) // 2 if arr[mid] == target: return mid elif arr[mid] < target: low = mid + 1 else: high = mid - 1 return -1def lower_bound(arr, target): low, high = 0, len(arr) while low < high: mid = low + (high - low) // 2 if arr[mid] < target: low = mid + 1 else: high = mid return lowCommon Pitfalls
- Integer Overflow: Computing mid = (low + high) / 2 can overflow when low and high are large. The safe formula is mid = low + (high - low) / 2.
- Off-by-One Errors: The choice between low <= high vs low < high, and mid-1 vs mid, depends on whether the search is inclusive or exclusive. Mixing conventions leads to infinite loops or missed elements.
- Unsorted Input: Binary search requires the input to be sorted. Applying it to unsorted data produces incorrect results without any warning.
- Floating-Point Precision: When binary searching on real numbers, use a fixed number of iterations (e.g., 100) or an epsilon-based termination condition rather than exact equality.
- Duplicate Elements: The basic version may return any occurrence of the target among duplicates. Use lower_bound/upper_bound variants for deterministic behavior.
References
- Knuth, D. E. "The Art of Computer Programming, Volume 3: Sorting and Searching." Addison-Wesley, 1973.
- Bentley, J. L. "Programming Pearls." Addison-Wesley, 1986.
- Cormen, T. H., Leiserson, C. E., Rivest, R. L., and Stein, C. "Introduction to Algorithms." MIT Press, 3rd edition, 2009.
- Bloch, J. "Extra, Extra - Read All About It: Nearly All Binary Searches and Mergesorts are Broken." Google Research Blog, 2006.
- Morin, P. "Open Data Structures." Section 6.2: BinarySearchTree, 2013.