Definition and Overview
Concept
Disjoint set: data structure managing partitions of a set into disjoint subsets. Functionality: efficiently merge sets and identify representatives.
Purpose
Purpose: track connected components, equivalence classes, membership queries in dynamic set partitions.
History
Introduced: John Hopcroft and Robert Tarjan, 1973. Importance: foundational for graph algorithms, network connectivity, clustering.
Data Representation
Forest of Trees
Structure: collection of rooted trees. Each tree: represents one set. Root node: serves as set representative or leader.
Parent Array
Implementation: array storing parent of each element. Root nodes are self-parented or marked distinctly.
Rank and Size Arrays
Auxiliary arrays: maintain tree height (rank) or subtree size for balancing unions.
Core Operations
MakeSet
Operation: initialize singleton sets. Input: element x. Result: x is parent of itself.
Find
Operation: find representative of element's set. Use: identify if elements belong to same set.
Union
Operation: merge two disjoint sets. Input: representatives of two sets. Effect: one root becomes child of other.
Example
Sets: {1}, {2}, {3}. Union(1,2): {1,2}, Find(2) → representative of {1,2}.
MakeSet(x): parent[x] = x rank[x] = 0Find(x): while x != parent[x]: x = parent[x] return xUnion(x, y): rootX = Find(x) rootY = Find(y) if rootX == rootY: return parent[rootY] = rootXPath Compression Optimization
Mechanism
During Find, set all traversed nodes’ parents directly to root. Effect: flatten tree structure.
Benefit
Amortized speedup: Find operation approaches near-constant time.
Algorithm
Find(x): if parent[x] != x: parent[x] = Find(parent[x]) return parent[x]Union by Rank/Size
Union by Rank
Attach tree with smaller rank under root with larger rank. Rank: upper bound on tree height.
Union by Size
Attach smaller sized tree under larger sized tree. Maintains balanced trees, reduces height.
Effect
Combined with path compression: yields inverse Ackermann function complexity.
Union(x, y): rootX = Find(x) rootY = Find(y) if rootX == rootY: return if rank[rootX] < rank[rootY]: parent[rootX] = rootY else if rank[rootX] > rank[rootY]: parent[rootY] = rootX else: parent[rootY] = rootX rank[rootX] += 1Time Complexity Analysis
Without Optimizations
Worst-case: Find and Union can be O(n) due to skewed trees.
With Path Compression and Union by Rank
Amortized complexity: O(α(n)) per operation. α(n): inverse Ackermann function, practically constant ≤ 4.
Implications
Disjoint set operations are effectively constant time for all practical input sizes.
| Operation | Time Complexity |
|---|---|
| MakeSet | O(1) |
| Find (amortized) | O(α(n)) |
| Union (amortized) | O(α(n)) |
Applications
Graph Connectivity
Use: detect connected components, cycles in undirected graphs.
Kruskal’s Algorithm
Application: minimum spanning tree construction using edge sorting and union-find to avoid cycles.
Equivalence Class Partitioning
Use: clustering, image segmentation, network grouping.
Percolation and Network Reliability
Model connected clusters in probabilistic systems.
Other Uses
Type inference, unification in compilers, dynamic connectivity queries.
Implementation Details
Array-Based Storage
Parent, rank arrays indexed by element ids. Initialization: all elements self-parented.
Iterative vs Recursive Find
Recursive: simpler, uses call stack. Iterative: avoids stack overhead, preferred in large datasets.
Memory Considerations
Storage: O(n) for parent and rank arrays. Minimal overhead compared to graph data.
Thread Safety
Not inherently thread-safe. Requires synchronization when used in parallel algorithms.
Variants and Extensions
Persistent Disjoint Set
Supports historical queries, versioning without destruction of past states.
Dynamic Connectivity
Handles edge insertions and deletions with advanced data structures like link-cut trees.
Weighted Union-Find
Maintains additional data per set, e.g., sum, min, max of elements.
Partial Persistence
Allows rollback of union operations while preserving find queries consistency.
Comparison with Other Structures
Hash Sets
Hash sets: support membership, insertion, deletion but no efficient union or find representative.
Balanced Trees
Balanced trees: efficient ordered data management but poor at merging sets quickly.
Segment Trees
Segment trees: range queries, updates; not designed for disjoint set partitioning.
Disjoint Set Strength
Unique ability: near constant time union/find on dynamic partitions.
| Data Structure | Union | Find Representative | Use Case |
|---|---|---|---|
| Disjoint Set | O(α(n)) amortized | O(α(n)) amortized | Dynamic connectivity, MST |
| Hash Set | O(n) (union requires copying) | O(1) | Membership queries |
| Balanced Tree | O(log n) | O(log n) | Ordered data, range queries |
Limitations and Considerations
Non-Applicability to Overlapping Sets
Disjoint set: only partitions into non-overlapping subsets. Cannot model overlapping memberships.
Static Universe
Elements typically fixed upfront; dynamic addition possible but less efficient.
Not Suitable for Ordered Data
No inherent ordering or sorting capabilities within sets.
Parallelism Challenges
Concurrent union/find difficult without sophisticated synchronization.
References
- Hopcroft, J.E., Tarjan, R.E., "Efficient Algorithms for Graph Manipulation", Communications of the ACM, vol. 16, no. 6, 1973, pp. 372-378.
- Tarjan, R.E., "Efficiency of a Good But Not Linear Set Union Algorithm", Journal of the ACM, vol. 22, no. 2, 1975, pp. 215-225.
- Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C., "Introduction to Algorithms", 3rd ed., MIT Press, 2009, pp. 561-572.
- Gabow, H.N., Tarjan, R.E., "A Linear-Time Algorithm for a Special Case of Disjoint Set Union", Proceedings of the 19th Annual ACM Symposium on Theory of Computing, 1987, pp. 246-251.
- Tarjan, R.E., "Data Structures and Network Algorithms", SIAM, 1983, pp. 94-114.