Introduction

The closest pair of points problem is a fundamental question in computational geometry: given n points in a plane, find the pair of points with the smallest Euclidean distance between them. This problem has practical applications in collision detection, clustering, geographic information systems, and molecular modeling.

A brute-force approach comparing all pairs takes O(n^2) time. The divide and conquer algorithm, developed by Shamos and Hoey (1975) and later refined by several authors, solves the problem in O(n log n) time, which is optimal for comparison-based algorithms. The key insight is that after dividing the point set and solving each half, the merge step needs to examine only a narrow strip near the dividing line.

Brute-Force Approach

procedure ClosestPairBrute(points): minDist = infinity for i from 0 to n-2: for j from i+1 to n-1: d = distance(points[i], points[j]) if d < minDist: minDist = d closestPair = (points[i], points[j]) return closestPair

This examines all n*(n-1)/2 pairs and runs in O(n^2) time. For small point sets (n < 4), this is used as the base case in the divide and conquer algorithm.

Divide and Conquer Strategy

Algorithm Overview

  1. Presort: Sort all points by x-coordinate. Maintain a separate copy sorted by y-coordinate.
  2. Divide: Split the point set into two halves, L and R, using the median x-coordinate as the dividing line.
  3. Conquer: Recursively find the closest pair in L (distance dL) and the closest pair in R (distance dR). Let delta = min(dL, dR).
  4. Combine: Check if there exists a pair with one point in L and one in R that is closer than delta. This requires examining only points within a vertical strip of width 2*delta centered on the dividing line.

Pseudocode

procedure ClosestPair(pointsSortedByX, pointsSortedByY): n = length(pointsSortedByX) if n <= 3: return ClosestPairBrute(pointsSortedByX) mid = n / 2 midPoint = pointsSortedByX[mid] // Divide leftX = pointsSortedByX[0..mid-1] rightX = pointsSortedByX[mid..n-1] // Partition Y-sorted array leftY = points in pointsSortedByY that are in leftX rightY = points in pointsSortedByY that are in rightX // Conquer dL = ClosestPair(leftX, leftY) dR = ClosestPair(rightX, rightY) delta = min(dL, dR) // Combine: check strip strip = points in pointsSortedByY where |x - midPoint.x| < delta return min(delta, ClosestInStrip(strip, delta))

The Strip Optimization

The critical insight that makes the algorithm efficient is that only a limited number of points in the strip need to be compared with each other. Specifically, for each point in the strip, we need to compare it with at most 7 subsequent points when the strip points are sorted by y-coordinate.

procedure ClosestInStrip(strip, delta): // strip is sorted by y-coordinate minDist = delta for i from 0 to length(strip) - 1: j = i + 1 while j < length(strip) and (strip[j].y - strip[i].y) < minDist: d = distance(strip[i], strip[j]) if d < minDist: minDist = d j = j + 1 return minDist

Why at Most 7 Comparisons?

Consider a delta x 2*delta rectangle in the strip centered at the dividing line. This rectangle can be divided into 8 cells of size (delta/2) x (delta/2). Each cell can contain at most one point (since any two points in the same half must be at least delta apart). Therefore, at most 8 points can occupy this rectangle, meaning each point needs to be compared with at most 7 others.

RegionWidthHeightMax Points
Full strip2 * deltaunboundedup to n
Comparison window2 * deltadelta8
Each celldelta / 2delta / 21

The bound of 7 comparisons per point is conservative. In practice, the average number of comparisons per strip point is much smaller. Some analyses tighten the bound to 5 by using a more careful packing argument.

Worked Example

Consider 8 points: (2,3), (12,30), (40,50), (5,1), (12,10), (3,4), (7,8), (15,25).

Step 1: Sort by x: (2,3), (3,4), (5,1), (7,8), (12,10), (12,30), (15,25), (40,50).

Step 2: Divide at midpoint x=7. Left: {(2,3), (3,4), (5,1), (7,8)}. Right: {(12,10), (12,30), (15,25), (40,50)}.

Step 3: Left half: closest pair is (2,3)-(3,4) with distance sqrt(2) = 1.414. Right half: closest pair is (12,30)-(15,25) with distance sqrt(34) = 5.831. delta = 1.414.

Step 4: Strip with width 2*1.414 around x=7: points with 5.586 < x < 8.414 are (7,8). Only one point, so no cross-pair is closer.

Result: Closest pair is (2,3) and (3,4) with distance sqrt(2).

Complexity Analysis

The recurrence relation is:

T(n) = 2T(n/2) + O(n)

The O(n) term comes from partitioning the y-sorted array and scanning the strip. By the Master theorem (a=2, b=2, f(n)=O(n)), this gives T(n) = O(n log n).

ComponentTime
PresortingO(n log n)
Recursive solvingO(n log n)
Strip scanning (total)O(n) per level, O(n log n) total
OverallO(n log n)

Space complexity: O(n) for the sorted arrays and recursion stack of depth O(log n).

Implementation

import mathdef closest_pair(points): def dist(p1, p2): return math.sqrt((p1[0]-p2[0])**2 + (p1[1]-p2[1])**2) def brute(pts): min_d = float('inf') pair = None for i in range(len(pts)): for j in range(i+1, len(pts)): d = dist(pts[i], pts[j]) if d < min_d: min_d = d pair = (pts[i], pts[j]) return min_d, pair def solve(px, py): n = len(px) if n <= 3: return brute(px) mid = n // 2 mid_x = px[mid][0] pyl = [p for p in py if p[0] <= mid_x] pyr = [p for p in py if p[0] > mid_x] dl, pairl = solve(px[:mid], pyl) dr, pairr = solve(px[mid:], pyr) if dl < dr: delta, best = dl, pairl else: delta, best = dr, pairr strip = [p for p in py if abs(p[0] - mid_x) < delta] for i in range(len(strip)): j = i + 1 while j < len(strip) and strip[j][1] - strip[i][1] < delta: d = dist(strip[i], strip[j]) if d < delta: delta = d best = (strip[i], strip[j]) j += 1 return delta, best px = sorted(points, key=lambda p: p[0]) py = sorted(points, key=lambda p: p[1]) return solve(px, py)

Correctness Proof

The algorithm's correctness rests on proving that the strip-scanning step does not miss any pair closer than delta. The argument proceeds as follows:

  1. Any pair closer than delta must have both points within distance delta of the dividing line (by the triangle inequality applied to the x-coordinates).
  2. Among the strip points sorted by y-coordinate, any pair with y-coordinates differing by more than delta cannot be closer than delta.
  3. Within the y-range of delta, the packing argument shows at most 7 points need to be checked after each point.

The correctness of the strip optimization is the most subtle part of the algorithm. It depends critically on the geometric constraint that all points in each half are at least delta apart from each other, limiting the density of points in the strip.

Extensions and Applications

  • Higher Dimensions: The algorithm generalizes to d dimensions with O(n log^(d-1) n) time, though randomized algorithms achieve O(n) expected time in any fixed dimension.
  • k-Closest Pairs: Finding the k closest pairs can be done by maintaining a priority queue during the strip scan.
  • Bichromatic Closest Pair: Given red and blue points, find the closest red-blue pair. The divide and conquer approach adapts with a modified strip scan.
  • Collision Detection: In simulations and video games, closest-pair algorithms detect when objects are about to collide.
  • Clustering: The closest pair distance is used as a linkage criterion in hierarchical clustering algorithms.

References

  • Shamos, M. I. and Hoey, D. "Closest-Point Problems." Proceedings of the 16th Annual IEEE Symposium on Foundations of Computer Science, 1975, pp. 151-162.
  • Preparata, F. P. and Shamos, M. I. "Computational Geometry: An Introduction." Springer-Verlag, 1985.
  • Cormen, T. H., Leiserson, C. E., Rivest, R. L., and Stein, C. "Introduction to Algorithms." MIT Press, Chapter 33, 2009.
  • Rabin, M. O. "Probabilistic Algorithms." In Algorithms and Complexity: New Directions and Recent Results, Academic Press, 1976.
  • Fortune, S. and Hopcroft, J. E. "A Note on Rabin's Nearest-Neighbor Algorithm." Information Processing Letters, vol. 8, 1979, pp. 20-23.