Open Addressing | What's Your IQ

Definition and Overview

Concept

Open addressing is a collision resolution technique in hash tables where all elements reside within the hash array itself. Collisions are resolved by probing alternative indices until an empty slot is found.

Purpose

Purpose: to handle hash collisions efficiently without auxiliary data structures like linked lists.

Basic Principle

Principle: upon collision, sequentially probe array locations according to a deterministic sequence until insertion or search succeeds.

Historical Context

Introduced in 1954 by Donald Knuth as an alternative to chaining; foundational in hash table theory and practice.

Terminology

Terms: probe sequence, load factor, clustering, primary and secondary clustering, deletion markers (tombstones).

"Open addressing leverages systematic probing to maintain hash table efficiency in constrained memory." -- Donald Knuth

Collision Resolution Strategies

Open Addressing vs Chaining

Open addressing stores all keys in the array; chaining uses linked lists outside array. Open addressing saves pointers; chaining handles high load gracefully.

Types of Open Addressing

Main types: linear probing, quadratic probing, double hashing. Each differs in probe sequence calculation.

Trade-offs

Trade-offs: memory efficiency vs performance under high load; clustering vs probe length; simplicity vs complexity of computation.

Handling Full Table

Full table: insertion fails or triggers table resizing and rehashing.

Key Considerations

Considerations: choice of hash functions, probe functions, load factor thresholds.

Probing Methods

Linear Probing

Sequence: h(k), h(k)+1, h(k)+2, ... (mod table size). Simple, cache-friendly, but susceptible to primary clustering.

Quadratic Probing

Sequence: h(k), h(k)+1^2, h(k)+2^2, h(k)+3^2, ... (mod table size). Reduces primary clustering, but secondary clustering remains.

Double Hashing

Sequence: h(k), h(k)+i*h2(k), i=1,2,... (mod table size). Minimizes clustering by using a second hash function.

Comparison Table

Probing Method	Probe Sequence	Clustering Type	Pros	Cons
Linear Probing	h(k) + i	Primary Clustering	Simple, fast	Clustering degrades performance
Quadratic Probing	h(k) + i²	Secondary Clustering	Reduces primary clustering	May not probe all slots
Double Hashing	h(k) + i*h2(k)	Minimal Clustering	Best distribution	More complex, slower

Mathematical Conditions

For double hashing, h2(k) must be relatively prime to table size to ensure full coverage.

Probe Sequence Formula

Index(i) = (h1(k) + i * h2(k)) mod TableSize, i = 0,1,2,...

Insertion Process

Stepwise Procedure

1. Compute primary hash index.

2. Check slot availability.

3. If occupied, probe next according to method.

4. Insert key at first empty slot found.

Failure Conditions

Failure if entire table probed without empty slot: triggers rehash or insertion failure.

Handling Tombstones

Tombstones mark deleted slots; insertion may reuse tombstones to improve space utilization.

Algorithmic Complexity

Average-case: O(1) if load factor low. Worst-case: O(n) when table nearly full.

Insertion Pseudocode

function insert(key): for i in 0 to TableSize-1: idx = probe(key, i) if table[idx] is empty or tombstone: table[idx] = key return success return failure (table full)

Searching Mechanism

Search Procedure

Compute hash, probe sequence until key found or empty slot encountered.

Termination Conditions

Stop if key found or empty slot reached (indicating absence).

Handling Tombstones

Continue probing through tombstones as key may be further in sequence.

Algorithmic Complexity

Average-case: O(1). Worst-case: O(n) depending on clustering and load factor.

Search Pseudocode

function search(key): for i in 0 to TableSize-1: idx = probe(key, i) if table[idx] is empty: return not found if table[idx] == key: return found at idx return not found

Deletion Techniques

Challenges

Simply removing key breaks probe chains, causing failed searches.

Tombstone Markers

Mark deleted slots as tombstones to preserve probe sequence integrity.

Reinsertion

Periodic rehashing recommended to clean tombstones and optimize table.

Deletion Algorithm

Locate key using search; mark slot as tombstone instead of empty.

Impact on Performance

Excess tombstones increase probe length, degrade search and insertion speed.

Performance Analysis

Time Complexity

Average insert/search/delete: O(1) with low load. Worst-case: O(n) under high load/clustering.

Factors Affecting Performance

Load factor, clustering, hash function quality, probing method.

Memory Overhead

Minimal overhead compared to chaining as no pointers used.

Cache Efficiency

Better cache performance due to contiguous memory access patterns.

Empirical Observations

Linear probing fastest for low load; double hashing scales better under high load.

Load Factor and Its Impact

Definition

Load factor α = n / TableSize, where n = number of entries.

Effect on Performance

Higher α increases average probe length, degrading performance.

Recommended Thresholds

Maintain α < 0.7 for linear probing; α < 0.9 for double hashing.

Resizing Strategy

Resize and rehash when α exceeds threshold to maintain efficiency.

Mathematical Expectation

Expected probes for successful search (linear probing):S(α) ≈ 0.5 * (1 + 1/(1 - α))

Clustering Phenomena

Primary Clustering

Occurs in linear probing; contiguous occupied slots form clusters, increasing probe length.

Secondary Clustering

Occurs when keys with same initial hash probe same sequence; typical in quadratic probing.

Double Hashing and Clustering

Double hashing minimizes both primary and secondary clustering via independent probe sequences.

Impact on Efficiency

Clustering increases average probe count, leading to slowdown in operations.

Mitigation Techniques

Use double hashing, proper hash functions, and maintain low load factor.

Comparison with Other Methods

Open Addressing vs Chaining

Open addressing: better cache locality, space efficient. Chaining: simpler deletion, handles high load gracefully.

Open Addressing vs Perfect Hashing

Open addressing supports dynamic insertions/deletions; perfect hashing requires static sets.

Open Addressing vs Cuckoo Hashing

Cuckoo hashing offers worst-case O(1) lookup, but more complex insertions; open addressing simpler but probabilistic performance.

Table of Comparison

Method	Lookup Complexity	Insertion Complexity	Deletion	Memory Usage
Open Addressing	Average O(1)	Average O(1)	Complex (tombstones)	Low (no pointers)
Chaining	Average O(1 + α)	Average O(1)	Simple	Higher (pointers)
Cuckoo Hashing	Worst-case O(1)	Amortized O(1)	Simple	Moderate

Use Case Suitability

Open addressing preferred for memory-constrained, low load systems; chaining for high load, variable size sets.

Applications and Use Cases

General Purpose Hash Tables

Used in programming language runtimes, database indexing, caches.

Embedded Systems

Memory efficiency makes open addressing ideal for embedded hardware.

Compiler Symbol Tables

Fast lookup with low overhead critical in compilation.

Network Routers

Routing tables often use open addressing for rapid packet classification.

High-Performance Computing

Cache-friendly access patterns improve throughput in HPC applications.

Implementation Considerations

Choosing Table Size

Prefer prime table sizes to reduce clustering and ensure probe coverage.

Hash Functions

Primary and secondary hash functions must be independent and uniformly distributed.

Load Factor Monitoring

Implement dynamic resizing and rehashing when thresholds exceeded.

Managing Tombstones

Track tombstone count and trigger clean-up rehash to maintain efficiency.

Thread Safety

Concurrent access requires locking or lock-free designs to avoid race conditions.

References

D.E. Knuth, "The Art of Computer Programming, Volume 3: Sorting and Searching," Addison-Wesley, 1998, pp. 510-545.
R. Sedgewick, "Algorithms in C, Part 5: Graph Algorithms," Addison-Wesley, 2002, pp. 200-230.
M. Mitzenmacher, E. Upfal, "Probability and Computing: Randomized Algorithms and Probabilistic Analysis," Cambridge University Press, 2005, pp. 120-150.
P. Larson, "Dynamic Hash Tables," Communications of the ACM, vol. 31, no. 4, 1988, pp. 446-457.
T.H. Cormen, C.E. Leiserson, R.L. Rivest, C. Stein, "Introduction to Algorithms," 3rd ed., MIT Press, 2009, pp. 255-260.