Introduction
Breadth-First Search (BFS) is a fundamental graph traversal algorithm that explores vertices in order of their distance from a starting vertex. It visits all vertices at distance 1 before visiting any vertex at distance 2, all vertices at distance 2 before any at distance 3, and so on. This level-by-level exploration pattern gives BFS its name and its most important property: it naturally finds shortest paths in unweighted graphs.
BFS was first invented by Konrad Zuse in 1945 (though not published at the time) and independently reinvented by Edward F. Moore in 1959 for finding the shortest path out of a maze. It was later developed further by C. Y. Lee into a wire routing algorithm. Today, BFS is one of the two foundational graph traversal algorithms (alongside Depth-First Search) and serves as a building block for many more complex graph algorithms.
"Breadth-first search is one of the simplest algorithms for searching a graph and the archetype for many important graph algorithms." -- Cormen, Leiserson, Rivest, and Stein, Introduction to Algorithms
The algorithm uses a queue data structure to maintain the frontier of vertices to be explored. This first-in, first-out ordering ensures that vertices closer to the source are always processed before vertices farther away, which is the fundamental mechanism behind BFS's shortest-path guarantee.
How It Works
Core Mechanism
BFS operates by maintaining three categories of vertices throughout the search:
- Undiscovered (white): Vertices that have not yet been encountered.
- Discovered but not fully explored (gray): Vertices in the queue whose neighbors have not all been examined.
- Fully explored (black): Vertices whose entire adjacency list has been examined.
Step-by-Step Procedure
- Choose a source vertex s. Mark it as discovered and enqueue it.
- While the queue is not empty:
- Dequeue a vertex v from the front of the queue.
- For each neighbor w of v:
- If w has not been discovered, mark it as discovered, record its distance as distance(v) + 1, record v as its predecessor, and enqueue w.
- Mark v as fully explored.
- When the queue is empty, all reachable vertices have been visited.
Level-Order Traversal of Trees
When applied to a tree (a connected acyclic graph), BFS performs a level-order traversal: it visits the root, then all children of the root, then all grandchildren, and so on. This is in contrast to DFS, which explores as deep as possible before backtracking.
Algorithm and Pseudocode
Standard BFS
procedure BFS(G, s) Input: G = graph (adjacency list), s = source vertex Output: distance[] and predecessor[] arrays for all reachable vertices for each vertex u in G do visited[u] := false distance[u] := infinity predecessor[u] := null end for visited[s] := true distance[s] := 0 Q := empty queue enqueue(Q, s) while Q is not empty do u := dequeue(Q) for each neighbor v of u do if visited[v] = false then visited[v] := true distance[v] := distance[u] + 1 predecessor[v] := u enqueue(Q, v) end if end for end whileend procedureBFS for Shortest Path Reconstruction
procedure ShortestPath(G, s, t) Input: G = graph, s = source, t = target Output: shortest path from s to t as a list of vertices Run BFS(G, s) if visited[t] = false then return "No path exists" end if path := empty list current := t while current != null do prepend current to path current := predecessor[current] end while return pathend procedureBFS for Connected Components
procedure FindConnectedComponents(G) Input: G = undirected graph Output: component label for each vertex componentId := 0 for each vertex u in G do if visited[u] = false then BFS(G, u) // marks all vertices in this component componentId := componentId + 1 end if end for return componentId // total number of componentsend procedureComplexity Analysis
Time Complexity
O(V + E), where V is the number of vertices and E is the number of edges. Every vertex is enqueued and dequeued exactly once (O(V) operations), and every edge is examined exactly once when processing the adjacency list of its endpoints (O(E) operations for directed graphs, or O(2E) for undirected graphs, which is still O(E)).
Space Complexity
O(V). The queue can hold at most V vertices (in the worst case, all vertices are at the same level). The visited array, distance array, and predecessor array each require O(V) space. The total auxiliary space is therefore O(V).
Complexity Summary
| Metric | Complexity | Notes |
|---|---|---|
| Time | O(V + E) | Each vertex and edge processed once. |
| Space (queue) | O(V) | Worst case: all vertices on the same level. |
| Space (total) | O(V) | Visited, distance, and predecessor arrays. |
Graph Representation Impact
| Representation | Time | Space |
|---|---|---|
| Adjacency List | O(V + E) | O(V + E) |
| Adjacency Matrix | O(V^2) | O(V^2) |
Using an adjacency matrix, BFS takes O(V^2) time because finding all neighbors of a vertex requires scanning an entire row of V entries. For sparse graphs (E much less than V^2), adjacency lists are strongly preferred.
Worked Example
Consider the following undirected graph with 7 vertices:
A --- B --- C | | D --- E F | GEdges: {A-B, A-D, B-C, B-E, C-F, D-E, E-G}. Perform BFS starting from vertex A.
Step-by-Step Trace
| Step | Dequeued | Neighbors Examined | Newly Enqueued | Queue After | Distance Assigned |
|---|---|---|---|---|---|
| 0 | -- | -- | A | [A] | A=0 |
| 1 | A | B, D | B, D | [B, D] | B=1, D=1 |
| 2 | B | A, C, E | C, E | [D, C, E] | C=2, E=2 |
| 3 | D | A, E | (none) | [C, E] | -- |
| 4 | C | B, F | F | [E, F] | F=3 |
| 5 | E | B, D, G | G | [F, G] | G=3 |
| 6 | F | C | (none) | [G] | -- |
| 7 | G | E | (none) | [] | -- |
BFS traversal order: A, B, D, C, E, F, G
Distances from A: A=0, B=1, D=1, C=2, E=2, F=3, G=3
Shortest path from A to G: A - B - E - G (or A - D - E - G), both of length 3.
Shortest Path in Unweighted Graphs
BFS is the optimal algorithm for finding shortest paths in unweighted graphs (or graphs where all edges have equal weight). The key theorem is:
"In an unweighted graph, BFS discovers every vertex reachable from the source s, and the distance computed for each vertex v is the minimum number of edges on any path from s to v." -- Cormen et al., Introduction to Algorithms
Why BFS Guarantees Shortest Paths
The guarantee follows from the FIFO property of the queue. Vertices are processed in non-decreasing order of their distance from the source. When a vertex v is first discovered via some edge (u, v), u has already been assigned its correct shortest distance. Since distance(v) = distance(u) + 1, and u was the first vertex at its distance level to reach v, the distance assigned to v is minimal.
Limitations
BFS finds shortest paths only when all edges have equal weight. For weighted graphs, Dijkstra's algorithm (for non-negative weights) or the Bellman-Ford algorithm (for graphs that may have negative weights) must be used instead. For graphs with weights of only 0 and 1, a 0-1 BFS using a deque can find shortest paths in O(V + E) time.
Advantages and Disadvantages
Advantages
- Shortest Path Guarantee: BFS finds the shortest path in unweighted graphs, a property that DFS does not have.
- Complete: BFS will find a solution if one exists, and it finds the shallowest solution first.
- Optimal for Level-Order Processing: When you need to process vertices level by level (e.g., social network distance, web crawler depth limits), BFS is the natural choice.
- No Recursion: Unlike DFS, BFS uses an explicit queue and does not risk stack overflow on deep graphs.
- Finds Connected Components: BFS can efficiently identify all connected components in an undirected graph.
Disadvantages
- High Memory Usage: BFS must store all vertices at the current level in the queue. For graphs with high branching factors, this can consume enormous memory. If every vertex has b neighbors and the target is at depth d, BFS may need to store O(b^d) vertices.
- Not Suitable for Weighted Graphs: BFS does not account for edge weights and will not find shortest paths in weighted graphs.
- Slower Than DFS for Some Problems: For problems like cycle detection, topological sorting, or finding any path (not necessarily shortest), DFS is often more natural and efficient.
- Entire Level Must Be Stored: Unlike DFS, which only needs to store a single path, BFS requires storing the entire frontier.
Comparison with Depth-First Search
| Property | BFS | DFS |
|---|---|---|
| Data Structure | Queue (FIFO) | Stack (LIFO) / Recursion |
| Traversal Order | Level by level | As deep as possible first |
| Shortest Path (unweighted) | Yes | No |
| Time Complexity | O(V + E) | O(V + E) |
| Space (worst case) | O(V) or O(b^d) | O(V) or O(bd) |
| Cycle Detection | Possible but less natural | Natural (back edges) |
| Topological Sort | Possible (Kahn's algorithm) | Natural (reverse post-order) |
In general, use BFS when you need shortest paths or level-order processing, and use DFS when you need to explore deep structures, detect cycles, or perform topological sorting.
Applications
Shortest Path in Unweighted Graphs
BFS is the standard algorithm for computing shortest distances from a single source in unweighted graphs. This applies to network routing, social network analysis (degrees of separation), and puzzle solving (fewest moves to reach a goal state).
Web Crawlers
Web crawlers use BFS to explore web pages level by level, starting from seed URLs. This ensures that pages closer to the seed (often more relevant or authoritative) are crawled before more distant pages.
Social Network Analysis
Finding the degrees of separation between two people (as in the "six degrees of Kevin Bacon" problem) is a direct application of BFS shortest path.
Peer-to-Peer Networks
In peer-to-peer networks like BitTorrent, BFS is used to find all reachable nodes from a given node, which is essential for resource discovery and network topology mapping.
Garbage Collection
Some garbage collectors use BFS-like traversal (Cheney's algorithm) to identify all reachable objects from root references, marking unreachable objects for collection.
Broadcasting in Networks
BFS models how information propagates through a network when each node forwards a message to all its neighbors. This is the basis for flooding-based broadcast protocols.
References
- Cormen, T. H., Leiserson, C. E., Rivest, R. L., and Stein, C. Introduction to Algorithms (3rd ed.). MIT Press, 2009. Section 22.2: Breadth-First Search.
- Sedgewick, R. and Wayne, K. Algorithms (4th ed.). Addison-Wesley, 2011. Section 4.1: Undirected Graphs.
- Moore, E. F. "The Shortest Path Through a Maze." Proceedings of the International Symposium on the Theory of Switching, 1959.
- Kleinberg, J. and Tardos, E. Algorithm Design. Pearson, 2005. Chapter 3: Graphs.
- Skiena, S. S. The Algorithm Design Manual (2nd ed.). Springer, 2008. Section 5.6: Breadth-First Search.