Definition and Purpose

Disjoint Set Concept

Union Find, aka Disjoint Set Union (DSU), manages partition of a set into disjoint subsets. Supports dynamic connectivity queries efficiently.

Use Cases

Used in graph connectivity, network connectivity, image processing, clustering, Kruskal’s MST, and equivalence checking.

Core Principle

Maintains elements divided into non-overlapping sets. Queries determine if two elements belong to the same set; unions merge sets.

Basic Operations

Find

Determines the representative (root) of the set containing a given element. Identifies connectivity.

Union

Merges two disjoint sets into a single set. Updates internal structure to maintain correctness.

Initialization

Starts with each element in its own singleton set. Internal arrays or pointers track parent relationships.

Data Structure Implementation

Parent Array

Each element stores a pointer to its parent. Root elements point to themselves.

Rank/Size Array

Maintains tree height or size to optimize union operations. Prevents tall, inefficient trees.

Forest Representation

Collection of trees representing disjoint sets. Trees flatten over time with path compression.

ComponentDescription
parent[]Stores parent reference of each element
rank[] or size[]Tracks tree rank or size for balanced union

Union by Rank / Size

Concept

Union operation attaches the smaller tree under the root of the larger tree based on rank or size.

Benefits

Maintains shallow tree structure. Reduces worst-case find operation time.

Procedure

Compare ranks of roots. Attach lower rank root to higher rank. If equal, increment rank of new root.

function union(x, y): rootX = find(x) rootY = find(y) if rootX == rootY: return if rank[rootX] < rank[rootY]: parent[rootX] = rootY else if rank[rootX] > rank[rootY]: parent[rootY] = rootX else: parent[rootY] = rootX rank[rootX] += 1

Path Compression

Concept

Flattens tree during find by making every node on the path point directly to root.

Effect on Performance

Drastically reduces find time on subsequent queries. Amortized nearly constant time.

Algorithm

function find(x): if parent[x] != x: parent[x] = find(parent[x]) return parent[x]

Time Complexity Analysis

Without Optimizations

Union and find can degrade to O(n) per operation due to tall trees.

With Union by Rank and Path Compression

Amortized time per operation: O(α(n)) where α is inverse Ackermann function, practically constant.

Summary Table

OperationTime Complexity
Find (naive)O(n)
Union (naive)O(n)
Find (optimized)O(α(n)) amortized
Union (optimized)O(α(n)) amortized

Applications

Graph Connectivity

Detects connected components, cycles, and builds MSTs (Kruskal’s algorithm).

Network Connectivity

Manages dynamic networks, connectivity queries, incremental updates.

Image Processing

Finds connected regions, labels components efficiently.

Clustering

Supports hierarchical clustering, equivalence relations, partitioning problems.

Advanced Variations

Union by Size

Alternative to rank: attaches smaller sized tree under larger. Similar performance.

Persistent Union Find

Supports historical queries with versions. Uses persistent data structures.

Rollback Union Find

Allows undoing union operations. Useful in offline dynamic connectivity.

Weighted Union Find

Stores weights or distances, supports queries on differences between elements.

Parallelization and Concurrency

Challenges

Union Find’s pointer updates cause race conditions. Path compression complicates parallelism.

Approaches

Lock-free algorithms, concurrent union by rank, batch processing of unions and finds.

Applications

Parallel graph algorithms, distributed systems, large-scale clustering.

Limitations and Alternatives

Limitations

Supports only connectivity queries, no direct support for edge deletions or complex queries.

Alternatives

Link-cut trees for dynamic connectivity with edge deletions, Euler tour trees, dynamic connectivity data structures.

Trade-offs

Union Find is simplest and fastest for incremental connectivity but limited in query flexibility.

Example Implementation

Initialization

n = number_of_elementsparent = [i for i in range(n)]rank = [0] * n

Find with Path Compression

def find(x): if parent[x] != x: parent[x] = find(parent[x]) return parent[x]

Union by Rank

def union(x, y): rootX = find(x) rootY = find(y) if rootX == rootY: return if rank[rootX] < rank[rootY]: parent[rootX] = rootY elif rank[rootX] > rank[rootY]: parent[rootY] = rootX else: parent[rootY] = rootX rank[rootX] += 1

References

  • Tarjan, R.E., "Efficiency of a Good But Not Linear Set Union Algorithm," Journal of the ACM, vol. 22, no. 2, 1975, pp. 215–225.
  • Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C., "Introduction to Algorithms," 3rd Edition, MIT Press, 2009, pp. 536–542.
  • Gabow, H.N., Tarjan, R.E., "A Linear-Time Algorithm for a Special Case of Disjoint Set Union," Journal of Algorithms, vol. 9, no. 2, 1988, pp. 215–244.
  • Tarjan, R.E., "Data Structures and Network Algorithms," SIAM, 1983, pp. 94–111.
  • Blum, M., "Parallel Algorithms for Disjoint Set Union Problems," SIAM Journal on Computing, vol. 14, no. 2, 1985, pp. 290–310.