Binary Search (Searching) | What's Your IQ

Introduction

Binary search is one of the most fundamental and efficient searching algorithms in computer science. It operates on sorted arrays by repeatedly dividing the search interval in half. At each step, the algorithm compares the target value with the middle element of the current interval. If the target matches the middle element, the search is complete. If the target is less than the middle element, the search continues in the lower half; if greater, it continues in the upper half. This halving process continues until the target is found or the interval is empty.

The key insight behind binary search is that each comparison eliminates half of the remaining candidates, resulting in a logarithmic time complexity of O(log n). This makes binary search dramatically faster than linear search for large datasets. For example, searching through one million sorted elements requires at most 20 comparisons with binary search, compared to up to one million comparisons with linear search.

"Although the basic idea of binary search is comparatively straightforward, the details can be surprisingly tricky, and many good programmers have done it wrong the first few times they tried." -- Jon Bentley, Programming Pearls

Binary search was first described by John Mauchly in 1946, but the first bug-free published implementation did not appear until 1962. Even in 2006, a subtle integer overflow bug was discovered in widely used implementations including the Java standard library. These facts underscore the importance of understanding the algorithm thoroughly despite its apparent simplicity.

How It Works

Prerequisites

Binary search requires the input array to be sorted in non-decreasing order. If the array is not sorted, the algorithm will produce incorrect results. The data structure must also support random access (constant-time indexing), which is why binary search works on arrays but not directly on linked lists.

Step-by-Step Procedure

Set two pointers: low = 0 and high = n - 1, where n is the array length.
While low is less than or equal to high:
1. Compute the middle index: mid = low + (high - low) / 2.
2. If A[mid] equals the target, return mid.
3. If A[mid] is less than the target, set low = mid + 1 (search the right half).
4. If A[mid] is greater than the target, set high = mid - 1 (search the left half).
If the loop ends without finding the target, return -1 (not found).

Why mid = low + (high - low) / 2?

The naive formula mid = (low + high) / 2 can cause integer overflow when low and high are large. The expression low + (high - low) / 2 produces the same result without overflow risk. This subtle bug went undetected in many production implementations for decades.

Algorithm and Pseudocode

Iterative Binary Search

procedure BinarySearch(A, n, target) Input: A = sorted array of n elements, target = value to find Output: index of target in A, or -1 if not found low := 0 high := n - 1 while low <= high do mid := low + (high - low) / 2 if A[mid] = target then return mid else if A[mid] < target then low := mid + 1 else high := mid - 1 end if end while return -1end procedure

Recursive Binary Search

procedure BinarySearchRecursive(A, low, high, target) Input: A = sorted array, low and high = current search bounds, target = value to find Output: index of target in A, or -1 if not found if low > high then return -1 end if mid := low + (high - low) / 2 if A[mid] = target then return mid else if A[mid] < target then return BinarySearchRecursive(A, mid + 1, high, target) else return BinarySearchRecursive(A, low, mid - 1, target) end ifend procedure

Iterative vs. Recursive

Both approaches have identical time complexity of O(log n). The iterative version uses O(1) auxiliary space, while the recursive version uses O(log n) stack space due to recursive calls. In practice, the iterative version is generally preferred for its lower space overhead and avoidance of potential stack overflow on very large arrays.

Complexity Analysis

Time Complexity

Case	Complexity	Explanation
Best Case	O(1)	Target is at the middle of the array on the first comparison.
Average Case	O(log n)	Each comparison halves the search space; approximately log2(n) comparisons needed.
Worst Case	O(log n)	Target is not present or at the boundary; the search space is halved until empty.

Space Complexity

Implementation	Space Complexity	Reason
Iterative	O(1)	Uses only a fixed number of variables (low, high, mid).
Recursive	O(log n)	Each recursive call adds a frame to the call stack; maximum depth is log2(n).

Number of Comparisons

The maximum number of comparisons binary search performs is floor(log2(n)) + 1. For practical reference:

n = 1,000: at most 10 comparisons
n = 1,000,000: at most 20 comparisons
n = 1,000,000,000: at most 30 comparisons

This logarithmic growth is what makes binary search so powerful for large datasets.

Worked Example

Search for target = 23 in the sorted array A = [2, 5, 8, 12, 16, 23, 38, 56, 72, 91].

Step-by-Step Trace

Step	low	high	mid	A[mid]	Comparison	Action
1	0	9	4	16	16 < 23	Set low = 5
2	5	9	7	56	56 > 23	Set high = 6
3	5	6	5	23	23 = 23	Return 5

Result: The target value 23 is found at index 5 after only 3 comparisons. A linear search on the same array would have required 6 comparisons.

Unsuccessful Search Example

Search for target = 20 in the same array A = [2, 5, 8, 12, 16, 23, 38, 56, 72, 91].

Step 1: low=0, high=9, mid=4, A[4]=16. 16 < 20, so low = 5.
Step 2: low=5, high=9, mid=7, A[7]=56. 56 > 20, so high = 6.
Step 3: low=5, high=6, mid=5, A[5]=23. 23 > 20, so high = 4.
Step 4: low=5, high=4. low > high, so return -1 (not found).

The search terminates after 3 comparisons, confirming that 20 is not in the array.

Lower Bound, Upper Bound, and Other Variants

Lower Bound (First Occurrence)

The lower bound variant finds the index of the first element that is greater than or equal to the target. This is particularly useful when the array contains duplicate values and you need the leftmost occurrence.

procedure LowerBound(A, n, target) low := 0 high := n while low < high do mid := low + (high - low) / 2 if A[mid] < target then low := mid + 1 else high := mid end if end while return lowend procedure

Upper Bound (First Element Greater Than Target)

The upper bound variant finds the index of the first element that is strictly greater than the target. Combined with lower bound, it can determine the range of equal elements.

procedure UpperBound(A, n, target) low := 0 high := n while low < high do mid := low + (high - low) / 2 if A[mid] <= target then low := mid + 1 else high := mid end if end while return lowend procedure

Counting Occurrences

The number of times a value appears in a sorted array can be computed as UpperBound(target) - LowerBound(target), both operations running in O(log n) time.

Binary Search on the Answer

A powerful technique in competitive programming is applying binary search not on an array, but on the answer space of an optimization problem. If a problem asks "what is the minimum value of X such that condition C holds?" and the condition is monotonic (once true, it stays true for all larger values), binary search can find the answer in O(log(range) * cost_of_checking_C).

"Binary search can be applied whenever you have a monotonic predicate -- a condition that is false for all values below some threshold and true for all values above it." -- Competitive Programmer's Handbook, Antti Laaksonen

Advantages and Disadvantages

Advantages

Extremely Fast: O(log n) time complexity means it scales beautifully to very large datasets. Searching a billion elements requires at most 30 comparisons.
Efficient Memory Use: The iterative version uses O(1) extra space.
Deterministic: Always produces the correct result and has predictable performance characteristics.
Versatile Variants: Lower bound, upper bound, and "search on the answer" variants extend binary search to a wide range of problems.
Well-Studied: Decades of research and optimization ensure that implementations in standard libraries are highly optimized.

Disadvantages

Requires Sorted Data: The array must be sorted, and maintaining sorted order during insertions costs O(n) per insertion.
Requires Random Access: Does not work on linked lists or other sequential-access data structures.
Implementation Pitfalls: Off-by-one errors and integer overflow bugs are common; the algorithm is notoriously tricky to implement correctly.
Not Optimal for Small Arrays: For very small arrays, linear search may be faster due to lower overhead and better branch prediction.
Poor Cache Behavior: The non-sequential access pattern of binary search can cause more cache misses than linear search for moderate-sized arrays.

Comparison with Other Search Algorithms

Algorithm	Average Case	Worst Case	Requires Sorted	Space
Linear Search	O(n)	O(n)	No	O(1)
Binary Search	O(log n)	O(log n)	Yes	O(1)
Jump Search	O(sqrt(n))	O(sqrt(n))	Yes	O(1)
Interpolation Search	O(log log n)	O(n)	Yes (uniform)	O(1)
Exponential Search	O(log n)	O(log n)	Yes	O(1)

Binary search strikes an excellent balance between simplicity and efficiency. It is faster than linear and jump search, and more reliable than interpolation search (which can degrade to O(n) on non-uniform data). For most practical scenarios involving sorted arrays, binary search is the default choice.

Applications

Database Index Lookups

Database systems use B-trees and B+ trees for indexing, which are generalizations of binary search to disk-based storage. The core principle of halving the search space is the same.

Standard Library Functions

Most programming languages provide binary search in their standard libraries: C's bsearch(), Java's Arrays.binarySearch(), Python's bisect module, and C++'s std::lower_bound and std::upper_bound.

Debugging with Git Bisect

Git's bisect command uses binary search to find the commit that introduced a bug, efficiently narrowing down thousands of commits to the culprit in O(log n) steps.

Numerical Methods

The bisection method for finding roots of continuous functions is a direct application of binary search in the continuous domain. It repeatedly halves an interval known to contain a root.

Competitive Programming

Binary search on the answer is a common technique for optimization problems. Examples include finding the minimum capacity to ship packages within D days, or the maximum minimum distance when placing objects.

Machine Learning

Hyperparameter tuning sometimes uses binary-search-like strategies when the objective function is known to be unimodal with respect to a parameter.

References

Cormen, T. H., Leiserson, C. E., Rivest, R. L., and Stein, C. Introduction to Algorithms (3rd ed.). MIT Press, 2009. Section 2.3: Designing Algorithms.
Knuth, D. E. The Art of Computer Programming, Volume 3: Sorting and Searching (2nd ed.). Addison-Wesley, 1998. Section 6.2.1: Searching an Ordered Table.
Bentley, J. Programming Pearls (2nd ed.). Addison-Wesley, 2000. Column 4: Writing Correct Programs.
Bloch, J. "Extra, Extra -- Read All About It: Nearly All Binary Searches and Mergesorts are Broken." Google Research Blog, 2006.
Sedgewick, R. and Wayne, K. Algorithms (4th ed.). Addison-Wesley, 2011. Section 3.1: Symbol Tables.