Definition and Purpose
Disjoint Set Concept
Union Find, aka Disjoint Set Union (DSU), manages partition of a set into disjoint subsets. Supports dynamic connectivity queries efficiently.
Use Cases
Used in graph connectivity, network connectivity, image processing, clustering, Kruskal’s MST, and equivalence checking.
Core Principle
Maintains elements divided into non-overlapping sets. Queries determine if two elements belong to the same set; unions merge sets.
Basic Operations
Find
Determines the representative (root) of the set containing a given element. Identifies connectivity.
Union
Merges two disjoint sets into a single set. Updates internal structure to maintain correctness.
Initialization
Starts with each element in its own singleton set. Internal arrays or pointers track parent relationships.
Data Structure Implementation
Parent Array
Each element stores a pointer to its parent. Root elements point to themselves.
Rank/Size Array
Maintains tree height or size to optimize union operations. Prevents tall, inefficient trees.
Forest Representation
Collection of trees representing disjoint sets. Trees flatten over time with path compression.
| Component | Description |
|---|---|
| parent[] | Stores parent reference of each element |
| rank[] or size[] | Tracks tree rank or size for balanced union |
Union by Rank / Size
Concept
Union operation attaches the smaller tree under the root of the larger tree based on rank or size.
Benefits
Maintains shallow tree structure. Reduces worst-case find operation time.
Procedure
Compare ranks of roots. Attach lower rank root to higher rank. If equal, increment rank of new root.
function union(x, y): rootX = find(x) rootY = find(y) if rootX == rootY: return if rank[rootX] < rank[rootY]: parent[rootX] = rootY else if rank[rootX] > rank[rootY]: parent[rootY] = rootX else: parent[rootY] = rootX rank[rootX] += 1Path Compression
Concept
Flattens tree during find by making every node on the path point directly to root.
Effect on Performance
Drastically reduces find time on subsequent queries. Amortized nearly constant time.
Algorithm
function find(x): if parent[x] != x: parent[x] = find(parent[x]) return parent[x]Time Complexity Analysis
Without Optimizations
Union and find can degrade to O(n) per operation due to tall trees.
With Union by Rank and Path Compression
Amortized time per operation: O(α(n)) where α is inverse Ackermann function, practically constant.
Summary Table
| Operation | Time Complexity |
|---|---|
| Find (naive) | O(n) |
| Union (naive) | O(n) |
| Find (optimized) | O(α(n)) amortized |
| Union (optimized) | O(α(n)) amortized |
Applications
Graph Connectivity
Detects connected components, cycles, and builds MSTs (Kruskal’s algorithm).
Network Connectivity
Manages dynamic networks, connectivity queries, incremental updates.
Image Processing
Finds connected regions, labels components efficiently.
Clustering
Supports hierarchical clustering, equivalence relations, partitioning problems.
Advanced Variations
Union by Size
Alternative to rank: attaches smaller sized tree under larger. Similar performance.
Persistent Union Find
Supports historical queries with versions. Uses persistent data structures.
Rollback Union Find
Allows undoing union operations. Useful in offline dynamic connectivity.
Weighted Union Find
Stores weights or distances, supports queries on differences between elements.
Parallelization and Concurrency
Challenges
Union Find’s pointer updates cause race conditions. Path compression complicates parallelism.
Approaches
Lock-free algorithms, concurrent union by rank, batch processing of unions and finds.
Applications
Parallel graph algorithms, distributed systems, large-scale clustering.
Limitations and Alternatives
Limitations
Supports only connectivity queries, no direct support for edge deletions or complex queries.
Alternatives
Link-cut trees for dynamic connectivity with edge deletions, Euler tour trees, dynamic connectivity data structures.
Trade-offs
Union Find is simplest and fastest for incremental connectivity but limited in query flexibility.
Example Implementation
Initialization
n = number_of_elementsparent = [i for i in range(n)]rank = [0] * nFind with Path Compression
def find(x): if parent[x] != x: parent[x] = find(parent[x]) return parent[x]Union by Rank
def union(x, y): rootX = find(x) rootY = find(y) if rootX == rootY: return if rank[rootX] < rank[rootY]: parent[rootX] = rootY elif rank[rootX] > rank[rootY]: parent[rootY] = rootX else: parent[rootY] = rootX rank[rootX] += 1References
- Tarjan, R.E., "Efficiency of a Good But Not Linear Set Union Algorithm," Journal of the ACM, vol. 22, no. 2, 1975, pp. 215–225.
- Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C., "Introduction to Algorithms," 3rd Edition, MIT Press, 2009, pp. 536–542.
- Gabow, H.N., Tarjan, R.E., "A Linear-Time Algorithm for a Special Case of Disjoint Set Union," Journal of Algorithms, vol. 9, no. 2, 1988, pp. 215–244.
- Tarjan, R.E., "Data Structures and Network Algorithms," SIAM, 1983, pp. 94–111.
- Blum, M., "Parallel Algorithms for Disjoint Set Union Problems," SIAM Journal on Computing, vol. 14, no. 2, 1985, pp. 290–310.