Definition and Overview

Concept

Disjoint set: data structure managing partitions of a set into disjoint subsets. Functionality: efficiently merge sets and identify representatives.

Purpose

Purpose: track connected components, equivalence classes, membership queries in dynamic set partitions.

History

Introduced: John Hopcroft and Robert Tarjan, 1973. Importance: foundational for graph algorithms, network connectivity, clustering.

Data Representation

Forest of Trees

Structure: collection of rooted trees. Each tree: represents one set. Root node: serves as set representative or leader.

Parent Array

Implementation: array storing parent of each element. Root nodes are self-parented or marked distinctly.

Rank and Size Arrays

Auxiliary arrays: maintain tree height (rank) or subtree size for balancing unions.

Core Operations

MakeSet

Operation: initialize singleton sets. Input: element x. Result: x is parent of itself.

Find

Operation: find representative of element's set. Use: identify if elements belong to same set.

Union

Operation: merge two disjoint sets. Input: representatives of two sets. Effect: one root becomes child of other.

Example

Sets: {1}, {2}, {3}. Union(1,2): {1,2}, Find(2) → representative of {1,2}.

MakeSet(x): parent[x] = x rank[x] = 0Find(x): while x != parent[x]: x = parent[x] return xUnion(x, y): rootX = Find(x) rootY = Find(y) if rootX == rootY: return parent[rootY] = rootX

Path Compression Optimization

Mechanism

During Find, set all traversed nodes’ parents directly to root. Effect: flatten tree structure.

Benefit

Amortized speedup: Find operation approaches near-constant time.

Algorithm

Find(x): if parent[x] != x: parent[x] = Find(parent[x]) return parent[x]

Union by Rank/Size

Union by Rank

Attach tree with smaller rank under root with larger rank. Rank: upper bound on tree height.

Union by Size

Attach smaller sized tree under larger sized tree. Maintains balanced trees, reduces height.

Effect

Combined with path compression: yields inverse Ackermann function complexity.

Union(x, y): rootX = Find(x) rootY = Find(y) if rootX == rootY: return if rank[rootX] < rank[rootY]: parent[rootX] = rootY else if rank[rootX] > rank[rootY]: parent[rootY] = rootX else: parent[rootY] = rootX rank[rootX] += 1

Time Complexity Analysis

Without Optimizations

Worst-case: Find and Union can be O(n) due to skewed trees.

With Path Compression and Union by Rank

Amortized complexity: O(α(n)) per operation. α(n): inverse Ackermann function, practically constant ≤ 4.

Implications

Disjoint set operations are effectively constant time for all practical input sizes.

OperationTime Complexity
MakeSetO(1)
Find (amortized)O(α(n))
Union (amortized)O(α(n))

Applications

Graph Connectivity

Use: detect connected components, cycles in undirected graphs.

Kruskal’s Algorithm

Application: minimum spanning tree construction using edge sorting and union-find to avoid cycles.

Equivalence Class Partitioning

Use: clustering, image segmentation, network grouping.

Percolation and Network Reliability

Model connected clusters in probabilistic systems.

Other Uses

Type inference, unification in compilers, dynamic connectivity queries.

Implementation Details

Array-Based Storage

Parent, rank arrays indexed by element ids. Initialization: all elements self-parented.

Iterative vs Recursive Find

Recursive: simpler, uses call stack. Iterative: avoids stack overhead, preferred in large datasets.

Memory Considerations

Storage: O(n) for parent and rank arrays. Minimal overhead compared to graph data.

Thread Safety

Not inherently thread-safe. Requires synchronization when used in parallel algorithms.

Variants and Extensions

Persistent Disjoint Set

Supports historical queries, versioning without destruction of past states.

Dynamic Connectivity

Handles edge insertions and deletions with advanced data structures like link-cut trees.

Weighted Union-Find

Maintains additional data per set, e.g., sum, min, max of elements.

Partial Persistence

Allows rollback of union operations while preserving find queries consistency.

Comparison with Other Structures

Hash Sets

Hash sets: support membership, insertion, deletion but no efficient union or find representative.

Balanced Trees

Balanced trees: efficient ordered data management but poor at merging sets quickly.

Segment Trees

Segment trees: range queries, updates; not designed for disjoint set partitioning.

Disjoint Set Strength

Unique ability: near constant time union/find on dynamic partitions.

Data StructureUnionFind RepresentativeUse Case
Disjoint SetO(α(n)) amortizedO(α(n)) amortizedDynamic connectivity, MST
Hash SetO(n) (union requires copying)O(1)Membership queries
Balanced TreeO(log n)O(log n)Ordered data, range queries

Limitations and Considerations

Non-Applicability to Overlapping Sets

Disjoint set: only partitions into non-overlapping subsets. Cannot model overlapping memberships.

Static Universe

Elements typically fixed upfront; dynamic addition possible but less efficient.

Not Suitable for Ordered Data

No inherent ordering or sorting capabilities within sets.

Parallelism Challenges

Concurrent union/find difficult without sophisticated synchronization.

References

  • Hopcroft, J.E., Tarjan, R.E., "Efficient Algorithms for Graph Manipulation", Communications of the ACM, vol. 16, no. 6, 1973, pp. 372-378.
  • Tarjan, R.E., "Efficiency of a Good But Not Linear Set Union Algorithm", Journal of the ACM, vol. 22, no. 2, 1975, pp. 215-225.
  • Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C., "Introduction to Algorithms", 3rd ed., MIT Press, 2009, pp. 561-572.
  • Gabow, H.N., Tarjan, R.E., "A Linear-Time Algorithm for a Special Case of Disjoint Set Union", Proceedings of the 19th Annual ACM Symposium on Theory of Computing, 1987, pp. 246-251.
  • Tarjan, R.E., "Data Structures and Network Algorithms", SIAM, 1983, pp. 94-114.