Definition and Overview
Concept
Open addressing is a collision resolution technique in hash tables where all elements reside within the hash array itself. Collisions are resolved by probing alternative indices until an empty slot is found.
Purpose
Purpose: to handle hash collisions efficiently without auxiliary data structures like linked lists.
Basic Principle
Principle: upon collision, sequentially probe array locations according to a deterministic sequence until insertion or search succeeds.
Historical Context
Introduced in 1954 by Donald Knuth as an alternative to chaining; foundational in hash table theory and practice.
Terminology
Terms: probe sequence, load factor, clustering, primary and secondary clustering, deletion markers (tombstones).
"Open addressing leverages systematic probing to maintain hash table efficiency in constrained memory." -- Donald Knuth
Collision Resolution Strategies
Open Addressing vs Chaining
Open addressing stores all keys in the array; chaining uses linked lists outside array. Open addressing saves pointers; chaining handles high load gracefully.
Types of Open Addressing
Main types: linear probing, quadratic probing, double hashing. Each differs in probe sequence calculation.
Trade-offs
Trade-offs: memory efficiency vs performance under high load; clustering vs probe length; simplicity vs complexity of computation.
Handling Full Table
Full table: insertion fails or triggers table resizing and rehashing.
Key Considerations
Considerations: choice of hash functions, probe functions, load factor thresholds.
Probing Methods
Linear Probing
Sequence: h(k), h(k)+1, h(k)+2, ... (mod table size). Simple, cache-friendly, but susceptible to primary clustering.
Quadratic Probing
Sequence: h(k), h(k)+1^2, h(k)+2^2, h(k)+3^2, ... (mod table size). Reduces primary clustering, but secondary clustering remains.
Double Hashing
Sequence: h(k), h(k)+i*h2(k), i=1,2,... (mod table size). Minimizes clustering by using a second hash function.
Comparison Table
| Probing Method | Probe Sequence | Clustering Type | Pros | Cons |
|---|---|---|---|---|
| Linear Probing | h(k) + i | Primary Clustering | Simple, fast | Clustering degrades performance |
| Quadratic Probing | h(k) + i² | Secondary Clustering | Reduces primary clustering | May not probe all slots |
| Double Hashing | h(k) + i*h2(k) | Minimal Clustering | Best distribution | More complex, slower |
Mathematical Conditions
For double hashing, h2(k) must be relatively prime to table size to ensure full coverage.
Probe Sequence Formula
Index(i) = (h1(k) + i * h2(k)) mod TableSize, i = 0,1,2,...Insertion Process
Stepwise Procedure
1. Compute primary hash index.
2. Check slot availability.
3. If occupied, probe next according to method.
4. Insert key at first empty slot found.
Failure Conditions
Failure if entire table probed without empty slot: triggers rehash or insertion failure.
Handling Tombstones
Tombstones mark deleted slots; insertion may reuse tombstones to improve space utilization.
Algorithmic Complexity
Average-case: O(1) if load factor low. Worst-case: O(n) when table nearly full.
Insertion Pseudocode
function insert(key): for i in 0 to TableSize-1: idx = probe(key, i) if table[idx] is empty or tombstone: table[idx] = key return success return failure (table full)Searching Mechanism
Search Procedure
Compute hash, probe sequence until key found or empty slot encountered.
Termination Conditions
Stop if key found or empty slot reached (indicating absence).
Handling Tombstones
Continue probing through tombstones as key may be further in sequence.
Algorithmic Complexity
Average-case: O(1). Worst-case: O(n) depending on clustering and load factor.
Search Pseudocode
function search(key): for i in 0 to TableSize-1: idx = probe(key, i) if table[idx] is empty: return not found if table[idx] == key: return found at idx return not foundDeletion Techniques
Challenges
Simply removing key breaks probe chains, causing failed searches.
Tombstone Markers
Mark deleted slots as tombstones to preserve probe sequence integrity.
Reinsertion
Periodic rehashing recommended to clean tombstones and optimize table.
Deletion Algorithm
Locate key using search; mark slot as tombstone instead of empty.
Impact on Performance
Excess tombstones increase probe length, degrade search and insertion speed.
Performance Analysis
Time Complexity
Average insert/search/delete: O(1) with low load. Worst-case: O(n) under high load/clustering.
Factors Affecting Performance
Load factor, clustering, hash function quality, probing method.
Memory Overhead
Minimal overhead compared to chaining as no pointers used.
Cache Efficiency
Better cache performance due to contiguous memory access patterns.
Empirical Observations
Linear probing fastest for low load; double hashing scales better under high load.
Load Factor and Its Impact
Definition
Load factor α = n / TableSize, where n = number of entries.
Effect on Performance
Higher α increases average probe length, degrading performance.
Recommended Thresholds
Maintain α < 0.7 for linear probing; α < 0.9 for double hashing.
Resizing Strategy
Resize and rehash when α exceeds threshold to maintain efficiency.
Mathematical Expectation
Expected probes for successful search (linear probing):S(α) ≈ 0.5 * (1 + 1/(1 - α))Clustering Phenomena
Primary Clustering
Occurs in linear probing; contiguous occupied slots form clusters, increasing probe length.
Secondary Clustering
Occurs when keys with same initial hash probe same sequence; typical in quadratic probing.
Double Hashing and Clustering
Double hashing minimizes both primary and secondary clustering via independent probe sequences.
Impact on Efficiency
Clustering increases average probe count, leading to slowdown in operations.
Mitigation Techniques
Use double hashing, proper hash functions, and maintain low load factor.
Comparison with Other Methods
Open Addressing vs Chaining
Open addressing: better cache locality, space efficient. Chaining: simpler deletion, handles high load gracefully.
Open Addressing vs Perfect Hashing
Open addressing supports dynamic insertions/deletions; perfect hashing requires static sets.
Open Addressing vs Cuckoo Hashing
Cuckoo hashing offers worst-case O(1) lookup, but more complex insertions; open addressing simpler but probabilistic performance.
Table of Comparison
| Method | Lookup Complexity | Insertion Complexity | Deletion | Memory Usage |
|---|---|---|---|---|
| Open Addressing | Average O(1) | Average O(1) | Complex (tombstones) | Low (no pointers) |
| Chaining | Average O(1 + α) | Average O(1) | Simple | Higher (pointers) |
| Cuckoo Hashing | Worst-case O(1) | Amortized O(1) | Simple | Moderate |
Use Case Suitability
Open addressing preferred for memory-constrained, low load systems; chaining for high load, variable size sets.
Applications and Use Cases
General Purpose Hash Tables
Used in programming language runtimes, database indexing, caches.
Embedded Systems
Memory efficiency makes open addressing ideal for embedded hardware.
Compiler Symbol Tables
Fast lookup with low overhead critical in compilation.
Network Routers
Routing tables often use open addressing for rapid packet classification.
High-Performance Computing
Cache-friendly access patterns improve throughput in HPC applications.
Implementation Considerations
Choosing Table Size
Prefer prime table sizes to reduce clustering and ensure probe coverage.
Hash Functions
Primary and secondary hash functions must be independent and uniformly distributed.
Load Factor Monitoring
Implement dynamic resizing and rehashing when thresholds exceeded.
Managing Tombstones
Track tombstone count and trigger clean-up rehash to maintain efficiency.
Thread Safety
Concurrent access requires locking or lock-free designs to avoid race conditions.
References
- D.E. Knuth, "The Art of Computer Programming, Volume 3: Sorting and Searching," Addison-Wesley, 1998, pp. 510-545.
- R. Sedgewick, "Algorithms in C, Part 5: Graph Algorithms," Addison-Wesley, 2002, pp. 200-230.
- M. Mitzenmacher, E. Upfal, "Probability and Computing: Randomized Algorithms and Probabilistic Analysis," Cambridge University Press, 2005, pp. 120-150.
- P. Larson, "Dynamic Hash Tables," Communications of the ACM, vol. 31, no. 4, 1988, pp. 446-457.
- T.H. Cormen, C.E. Leiserson, R.L. Rivest, C. Stein, "Introduction to Algorithms," 3rd ed., MIT Press, 2009, pp. 255-260.