Minimum Spanning Tree | What's Your IQ

Introduction

A Minimum Spanning Tree (MST) of a connected, weighted, undirected graph is a spanning tree whose total edge weight is minimized. A spanning tree connects all vertices using exactly V-1 edges (where V is the number of vertices), forming a tree with no cycles. Among all possible spanning trees, the MST has the smallest possible sum of edge weights.

Two classic greedy algorithms solve the MST problem: Kruskal's algorithm (1956), which processes edges in order of increasing weight, and Prim's algorithm (1957), which grows the tree one vertex at a time from a starting vertex. Both algorithms rely on the cut property of MSTs, which guarantees that the greedy choice at each step is safe.

MST problems arise in network design (minimizing cable length), circuit design, clustering, and approximation algorithms for NP-hard problems like the Traveling Salesman Problem.

Definitions and Properties

Spanning tree: A subgraph that is a tree and includes every vertex of the graph.
MST weight: The sum of edge weights in the spanning tree.
Uniqueness: If all edge weights are distinct, the MST is unique. With duplicate weights, multiple MSTs may exist, but all have the same total weight.
Cycle property: For any cycle in the graph, the heaviest edge in the cycle is not in any MST (assuming distinct weights).
Cut property: For any cut of the graph, the lightest edge crossing the cut is in every MST.

Property	Description
Number of edges	Exactly V - 1
Connected	All vertices reachable from any other
Acyclic	No cycles
Adding any edge	Creates exactly one cycle
Removing any edge	Disconnects the tree into two components

The Cut Property

A cut is a partition of the vertex set into two non-empty subsets S and V-S. An edge crosses the cut if it has one endpoint in S and the other in V-S. The cut property states:

For any cut (S, V-S) of a graph, if edge e is the unique minimum-weight edge crossing the cut, then e belongs to every MST of the graph.

Both Kruskal's and Prim's algorithms work by repeatedly applying the cut property. At each step, they identify a cut where the lightest crossing edge is safe to add. This edge is guaranteed to be part of some MST, so including it cannot lead to a suboptimal solution.

The cut property can be proved by contradiction: suppose e is the lightest edge crossing cut (S, V-S) but some MST T does not include e. Adding e to T creates a cycle, and this cycle must contain another edge f crossing the same cut (since e connects S to V-S and the cycle must return). Since w(e) < w(f), replacing f with e gives a spanning tree with smaller weight, contradicting T being an MST.

Kruskal's Algorithm

Kruskal's algorithm sorts all edges by weight and greedily adds edges that do not form a cycle, using a Union-Find data structure to track connected components.

procedure Kruskal(graph): sort edges by weight ascending MST = empty set DSU = DisjointSetUnion(V) for each edge (u, v, w) in sorted order: if DSU.find(u) != DSU.find(v): MST.add((u, v, w)) DSU.union(u, v) if |MST| == V - 1: break return MST

The algorithm processes edges globally in weight order. It adds an edge if and only if it connects two different components (i.e., does not create a cycle). This is the cut property applied to the cut separating the two components.

Prim's Algorithm

Prim's algorithm grows the MST from a single starting vertex, always adding the cheapest edge that connects a vertex in the tree to a vertex outside the tree.

procedure Prim(graph, start): MST = empty set inTree = Begin Test priorityQueue = edges from start, ordered by weight while |inTree| < V: (u, v, w) = priorityQueue.extractMin() if v not in inTree: MST.add((u, v, w)) inTree.add(v) for each edge (v, x, w2) where x not in inTree: priorityQueue.insert((v, x, w2)) return MST

At each step, the algorithm applies the cut property to the cut (inTree, V - inTree), adding the minimum-weight crossing edge.

Worked Example

Consider a graph with 5 vertices and the following edges:

Edges: (A-B, 2), (A-D, 6), (B-C, 3), (B-D, 8), (B-E, 5), (C-E, 7), (D-E, 9)

Kruskal's (sorted edges): (A-B,2), (B-C,3), (B-E,5), (A-D,6), (C-E,7), (B-D,8), (D-E,9)

Add A-B (2): components {A,B}, Minimum Spanning Tree, A minimum spanning tree is a subset of edges in a weighted graph that connects all vertices with the minimum total edge weight. This concept is fundamental in network design, clustering, and optimization problems., Learn how greedy algorithms construct efficient spanning trees by selecting minimum-weight edges while avoiding cycles.
Add B-C (3): components {A,B,C}, A minimum spanning tree is a subset of edges in a weighted graph that connects all vertices with the minimum total edge weight. This concept is fundamental in network design, clustering, and optimization problems., Learn how greedy algorithms construct efficient spanning trees by selecting minimum-weight edges while avoiding cycles.
Add B-E (5): components {A,B,C,E}, A minimum spanning tree is a subset of edges in a weighted graph that connects all vertices with the minimum total edge weight. This concept is fundamental in network design, clustering, and optimization problems.
Add A-D (6): components {A,B,C,D,E}. Done.

MST edges: {A-B, B-C, B-E, A-D}. Total weight: 2+3+5+6 = 16.

Prim's (starting at A):

Start at A. Edges: (A-B,2), (A-D,6). Add A-B (2).
Tree: {A,B}. Edges: (A-D,6), (B-C,3), (B-D,8), (B-E,5). Add B-C (3).
Tree: {A,B,C}. Edges: (A-D,6), (B-D,8), (B-E,5), (C-E,7). Add B-E (5).
Tree: {A,B,C,E}. Edges: (A-D,6), (B-D,8), (D-E,9). Add A-D (6).

Same MST, same total weight of 16.

Complexity Comparison

Algorithm	Time Complexity	Best For
Kruskal's (with Union-Find)	O(E log E)	Sparse graphs (E ~ V)
Prim's (binary heap)	O(E log V)	Dense graphs
Prim's (Fibonacci heap)	O(E + V log V)	Very dense graphs
Prim's (adjacency matrix)	O(V^2)	Dense graphs, simple implementation

Since E <= V^2, we have log E <= 2 log V, so O(E log E) = O(E log V). For sparse graphs where E = O(V), Kruskal's runs in O(V log V). For dense graphs where E = O(V^2), Prim's with a Fibonacci heap runs in O(V^2 + V log V) = O(V^2), matching the adjacency matrix version.

Correctness Proofs

Both algorithms are correct because they follow the generic MST strategy: maintain a set of edges A that is always a subset of some MST, and at each step add a "safe" edge -- one that keeps A a subset of an MST.

Kruskal's correctness: When edge (u,v) is added, u and v are in different components. The edge is the lightest crossing the cut between these components (since all lighter edges were already processed and either added or found to connect vertices in the same component). By the cut property, this edge is in some MST.

Prim's correctness: The tree grown by Prim's always forms a cut (inTree, V-inTree). The extracted edge is the minimum weight crossing this cut. By the cut property, it belongs to some MST.

The cut property is the unifying principle behind all known MST algorithms. Any algorithm that repeatedly adds the minimum-weight edge crossing some cut will produce a correct MST.

Applications

Network Design: Laying cable, pipes, or roads to connect locations at minimum cost.
Cluster Analysis: Removing the k-1 heaviest edges from an MST produces k clusters (single-linkage clustering).
Approximation Algorithms: The MST provides a 2-approximation for the metric TSP (the MST tour visits each edge twice, giving a tour at most twice the optimal).
Image Segmentation: Treating pixels as graph vertices with edge weights based on color difference; MST-based methods partition the image into regions.
Taxonomy: Constructing phylogenetic trees to represent evolutionary relationships among species.

References

Kruskal, J. B. "On the Shortest Spanning Subtree of a Graph and the Traveling Salesman Problem." Proceedings of the American Mathematical Society, vol. 7, 1956, pp. 48-50.
Prim, R. C. "Shortest Connection Networks and Some Generalizations." Bell System Technical Journal, vol. 36, 1957, pp. 1389-1401.
Cormen, T. H., Leiserson, C. E., Rivest, R. L., and Stein, C. "Introduction to Algorithms." MIT Press, Chapter 23, 2009.
Fredman, M. L. and Tarjan, R. E. "Fibonacci Heaps and Their Uses in Improved Network Optimization Algorithms." Journal of the ACM, vol. 34, 1987, pp. 596-615.
Chazelle, B. "A Minimum Spanning Tree Algorithm with Inverse-Ackermann Type Complexity." Journal of the ACM, vol. 47, 2000, pp. 1028-1047.