Depth First Search (Searching)

Introduction

Depth-First Search (DFS) is a fundamental graph traversal algorithm that explores as far as possible along each branch before backtracking. Starting from a given source vertex, DFS follows a path from that vertex as deep as it can go, and only when it reaches a dead end (a vertex with no unvisited neighbors) does it retreat and try a different path. This strategy produces a traversal that is radically different from BFS's level-by-level approach.

DFS was first investigated in the 19th century by French mathematician Charles Pierre Tremaux as a strategy for solving mazes. In the modern context, it was formalized by John Hopcroft and Robert Tarjan in the early 1970s, who demonstrated its power for solving a wide variety of graph problems including finding connected components, detecting cycles, computing topological orderings, and identifying strongly connected components. Their work earned them the Turing Award in 1986.

"Depth-first search is a surprisingly versatile linear-time algorithm. It reveals a wealth of information about a graph." -- Sanjoy Dasgupta, Christos Papadimitriou, and Umesh Vazirani, Algorithms

DFS can be implemented either recursively (using the system call stack) or iteratively (using an explicit stack data structure). Both approaches have the same time complexity of O(V + E), but they differ in space usage and in susceptibility to stack overflow on very deep graphs.

How It Works

Core Mechanism

DFS uses timestamps to record when each vertex is first discovered and when it is finished (all its descendants have been fully explored). These timestamps partition the algorithm's execution into intervals and are essential for classifying edges and solving problems like cycle detection and topological sorting.

Vertex Colors

White: The vertex has not been discovered yet.
Gray: The vertex has been discovered but not yet finished; it is currently on the recursion stack.
Black: The vertex and all its descendants have been fully explored.

Edge Classification

DFS classifies every edge in the graph into one of four types:

Tree edge: An edge leading to a white (undiscovered) vertex. These edges form the DFS tree (or forest).
Back edge: An edge from a vertex to one of its ancestors in the DFS tree (a gray vertex). Back edges indicate cycles.
Forward edge: An edge from a vertex to a descendant in the DFS tree that is not a tree edge.
Cross edge: An edge between vertices in different subtrees of the DFS forest.

In undirected graphs, only tree edges and back edges are possible. Forward and cross edges can only appear in directed graphs.

Algorithm and Pseudocode

Recursive DFS

procedure DFS(G) Input: G = graph (adjacency list) time := 0 for each vertex u in G do color[u] := WHITE predecessor[u] := null end for for each vertex u in G do if color[u] = WHITE then DFS-Visit(G, u) end if end forend procedureprocedure DFS-Visit(G, u) time := time + 1 discovery[u] := time color[u] := GRAY for each neighbor v of u do if color[v] = WHITE then predecessor[v] := u DFS-Visit(G, v) end if end for color[u] := BLACK time := time + 1 finish[u] := timeend procedure

Iterative DFS (Using Explicit Stack)

procedure DFS-Iterative(G, s) Input: G = graph, s = source vertex S := empty stack push(S, s) while S is not empty do u := pop(S) if visited[u] = false then visited[u] := true process(u) for each neighbor v of u do if visited[v] = false then push(S, v) end if end for end if end whileend procedure

Note: The iterative version processes vertices in a slightly different order than the recursive version because the stack reverses the order of neighbor exploration. To match the recursive order exactly, push neighbors in reverse order.

Complexity Analysis

Time Complexity

O(V + E). Every vertex is visited exactly once (when it transitions from white to gray), and every edge is examined exactly once (when exploring the adjacency list of its source vertex). This makes DFS a linear-time algorithm with respect to the size of the graph.

Space Complexity

Implementation	Space	Notes
Recursive	O(V)	Call stack depth can be V in the worst case (e.g., a path graph).
Iterative	O(V)	Explicit stack can hold at most V vertices.

Comparison with BFS

Metric	DFS	BFS
Time	O(V + E)	O(V + E)
Space (narrow deep graph)	O(V) -- poor	O(1) -- excellent
Space (wide shallow graph)	O(depth) -- excellent	O(V) -- poor

Worked Example

Consider the following directed graph with 6 vertices:

 A ---> B ---> C | ^ v | D ---> E ---> F

Edges: {A->B, A->D, B->C, D->E, E->F, F->C}. Perform DFS starting from vertex A.

Step-by-Step Trace (Recursive)

Action	Vertex	Discovery	Finish	Stack State (gray vertices)
Discover A	A	1	--	[A]
Discover B (via A->B)	B	2	--	[A, B]
Discover C (via B->C)	C	3	--	[A, B, C]
Finish C (no unvisited neighbors)	C	3	4	[A, B]
Finish B (no more unvisited neighbors)	B	2	5	[A]
Discover D (via A->D)	D	6	--	[A, D]
Discover E (via D->E)	E	7	--	[A, D, E]
Discover F (via E->F)	F	8	--	[A, D, E, F]
F->C: C is black (cross edge)	F	8	9	[A, D, E]
Finish E	E	7	10	[A, D]
Finish D	D	6	11	[A]
Finish A	A	1	12	[]

DFS traversal order (discovery): A, B, C, D, E, F

DFS finish order: C, B, F, E, D, A

Pre-Order, In-Order, and Post-Order Traversals

When DFS is applied to trees, three important traversal orders emerge based on when the vertex is processed relative to its children:

Pre-Order Traversal

Process the vertex before visiting any of its children. This corresponds to the order in which vertices are first discovered during DFS.

procedure PreOrder(node) if node = null then return process(node) PreOrder(node.left) PreOrder(node.right)end procedure

In-Order Traversal

Process the vertex between visiting the left and right children. This produces a sorted sequence when applied to a binary search tree.

procedure InOrder(node) if node = null then return InOrder(node.left) process(node) InOrder(node.right)end procedure

Post-Order Traversal

Process the vertex after visiting all of its children. This corresponds to the order in which vertices are finished during DFS and is used for tasks like expression evaluation and dependency resolution.

procedure PostOrder(node) if node = null then return PostOrder(node.left) PostOrder(node.right) process(node)end procedure

Example on a Binary Tree

For the tree with root 1, left child 2 (with children 4, 5), and right child 3 (with children 6, 7):

Pre-Order: 1, 2, 4, 5, 3, 6, 7
In-Order: 4, 2, 5, 1, 6, 3, 7
Post-Order: 4, 5, 2, 6, 7, 3, 1

Cycle Detection

In Directed Graphs

A directed graph contains a cycle if and only if DFS encounters a back edge -- an edge from a gray vertex to another gray vertex (an ancestor on the current DFS path). During DFS, if we explore an edge (u, v) and v is currently gray, then v is an ancestor of u, forming a cycle.

procedure HasCycle(G) for each vertex u in G do color[u] := WHITE end for for each vertex u in G do if color[u] = WHITE then if DFS-Cycle(G, u) then return true end if end if end for return falseend procedureprocedure DFS-Cycle(G, u) color[u] := GRAY for each neighbor v of u do if color[v] = GRAY then return true // Back edge found: cycle exists end if if color[v] = WHITE then if DFS-Cycle(G, v) then return true end if end if end for color[u] := BLACK return falseend procedure

In Undirected Graphs

For undirected graphs, a cycle exists if DFS encounters a visited vertex that is not the parent of the current vertex. The parent check is necessary because in an undirected graph, every edge (u, v) appears as both u->v and v->u; without the parent check, every edge would be falsely reported as a cycle.

Topological Sort

A topological ordering of a directed acyclic graph (DAG) is a linear ordering of its vertices such that for every directed edge (u, v), vertex u comes before vertex v. Topological sort is only possible on DAGs; if the graph contains a cycle, no topological ordering exists.

DFS-Based Topological Sort

The algorithm performs a complete DFS and outputs vertices in reverse order of their finish times. This works because if there is an edge (u, v), then v finishes before u, so u appears before v in the reversed finish order.

procedure TopologicalSort(G) Perform DFS(G) to compute finish times for all vertices Output vertices in decreasing order of finish timeend procedure

"A topological sort of a DAG can be viewed as an ordering of its vertices along a horizontal line so that all directed edges go from left to right." -- Cormen et al., Introduction to Algorithms

Using the finish times from the worked example above (where finish order was C=4, B=5, F=9, E=10, D=11, A=12), the topological sort is: A, D, E, F, B, C. This respects all edge directions in the original graph.

Advantages and Disadvantages

Advantages

Memory Efficient on Deep Graphs: DFS only needs to store the current path from root to the current vertex, using O(depth) space. For deep, narrow graphs this is much less than BFS's O(width) space.
Natural for Recursive Problems: Many graph problems (cycle detection, topological sort, strongly connected components, articulation points) are most naturally expressed using DFS.
Edge Classification: DFS timestamps enable classification of edges into tree, back, forward, and cross edges, which is essential for many graph analyses.
Simple Recursive Implementation: The recursive form is concise and directly mirrors the algorithm's logic.
Path Finding: DFS naturally finds a path between two vertices (though not necessarily the shortest).

Disadvantages

No Shortest Path Guarantee: DFS does not find shortest paths. The first path found may be arbitrarily longer than the shortest path.
Stack Overflow Risk: The recursive implementation can cause stack overflow on very deep graphs (e.g., a path graph with millions of vertices).
Not Complete for Infinite Graphs: DFS can get trapped in an infinite branch and never reach the target. BFS is complete in this sense.
Order Dependent: The traversal result depends on the order in which neighbors are explored, which can make the output non-deterministic if the adjacency list order is not fixed.

Applications

Cycle Detection

Detecting cycles in directed graphs using back edges is essential for deadlock detection in operating systems, dependency validation in build systems, and constraint checking in databases.

Topological Sorting

Used in task scheduling, build systems (Make, Maven), course prerequisite planning, and spreadsheet cell evaluation order.

Strongly Connected Components

Tarjan's algorithm and Kosaraju's algorithm both use DFS to find strongly connected components in directed graphs. These are used in compiler optimization, social network analysis, and 2-SAT solvers.

Maze Solving and Puzzle Exploration

DFS is a natural algorithm for maze solving: follow a path until you hit a dead end, then backtrack and try a different direction. This backtracking strategy applies to any search problem with a tree-like solution space.

Finding Articulation Points and Bridges

DFS can identify articulation points (vertices whose removal disconnects the graph) and bridges (edges whose removal disconnects the graph) in O(V + E) time, which is critical for network reliability analysis.

Game Trees and AI

Minimax search and alpha-beta pruning for game AI are DFS-based algorithms that explore the game tree depth-first to find optimal moves.

References

Cormen, T. H., Leiserson, C. E., Rivest, R. L., and Stein, C. Introduction to Algorithms (3rd ed.). MIT Press, 2009. Section 22.3: Depth-First Search.
Tarjan, R. E. "Depth-First Search and Linear Graph Algorithms." SIAM Journal on Computing, 1(2):146-160, 1972.
Sedgewick, R. and Wayne, K. Algorithms (4th ed.). Addison-Wesley, 2011. Section 4.1: Undirected Graphs.
Dasgupta, S., Papadimitriou, C., and Vazirani, U. Algorithms. McGraw-Hill, 2006. Chapter 3: Decompositions of Graphs.
Skiena, S. S. The Algorithm Design Manual (2nd ed.). Springer, 2008. Section 5.8: Depth-First Search.