Introduction
Collision resolution: critical aspect of hash table design. Hash functions map keys to indices. Collisions occur when multiple keys map to the same index. Efficient resolution ensures fast insertion, deletion, and search operations. Methods vary by data structure and application requirements.
"Handling collisions effectively is paramount to maintaining hash table performance and integrity." -- Thomas H. Cormen
Hash Collisions
Definition
Collision: two or more distinct keys hash to same slot. Event inevitable due to finite hash space versus infinite key space. Fundamental challenge in hashing algorithms.
Causes
Limited table size. Imperfect hash functions. Poor distribution of input keys. Load factor increase.
Consequences
Degraded performance: increased search time, clustering, potential overflow. Necessitates collision resolution strategies.
Collision Resolution Methods
Primary Techniques
Two major categories: open addressing and chaining. Open addressing stores all elements within hash table array. Chaining uses linked lists or other structures at each slot.
Trade-offs
Open addressing: better cache performance, fixed memory footprint. Chaining: simpler handling of variable load, easier deletion.
Selection Criteria
Load factor, expected data volume, memory constraints, operation types, average-case vs worst-case performance.
Open Addressing
Concept
All elements stored in hash table array. Collisions resolved by probing for next available slot. Probing sequence defined by collision resolution policy.
Probing Sequence
Determines order of slot checks after collision. Crucial for uniform distribution and avoiding clustering.
Load Factor Impact
Performance degrades rapidly as load factor approaches 1. Typical recommendation: keep below 0.7 to 0.8.
Chaining
Mechanism
Each slot holds pointer to linked list (or bucket). Colliding keys appended to list. Search involves traversing list at slot.
Advantages
Handles load factors >1 gracefully. Simple deletion. Flexible storage size.
Disadvantages
Extra memory for pointers. Potential cache inefficiency. Performance depends on linked list length.
| Chaining vs Open Addressing | Chaining | Open Addressing |
|---|---|---|
| Memory Usage | Extra pointers | Fixed array |
| Load Factor | >1 allowed | <1 recommended |
| Cache Performance | Poor | Better |
Linear Probing
Algorithm
Probe sequence: h(k), h(k)+1, h(k)+2,... mod table size. Checks next slot sequentially.
Advantages
Simple implementation. Good cache locality. Easy to compute next probe.
Disadvantages
Primary clustering: contiguous blocks form, increasing probe length. Performance degrades at high load.
// Insert with linear probingindex = hash(key) % table_sizewhile table[index] is occupied: index = (index + 1) % table_sizeinsert key at table[index]Quadratic Probing
Algorithm
Probe sequence: h(k), h(k)+1², h(k)+2²,... mod table size. Jumps increase quadratically.
Advantages
Reduces primary clustering. Better distribution than linear probing in many cases.
Disadvantages
Secondary clustering still possible. May not probe all slots if table size poorly chosen.
// Insert with quadratic probingindex = hash(key) % table_sizei = 0while table[(index + i*i) % table_size] is occupied: i = i + 1insert key at table[(index + i*i) % table_size]Double Hashing
Algorithm
Probe sequence: h(k), (h(k) + i*h2(k)) mod table size, i=1,2,... Uses second hash h2 for step size.
Advantages
Minimizes clustering. Excellent distribution if h2 chosen properly. Full probe coverage guaranteed if table size prime.
Disadvantages
More complex computation. Requires two hash functions. Slightly slower per operation.
// Insert with double hashingindex = hash1(key) % table_sizestep = hash2(key)i = 0while table[(index + i*step) % table_size] is occupied: i = i + 1insert key at table[(index + i*step) % table_size]Rehashing
Definition
Process of resizing hash table and recomputing hash values when load factor threshold exceeded.
Trigger Conditions
Load factor > 0.7 to 0.8 typical. Performance degradation or memory constraints.
Procedure
Create larger table (usually double size, prime number). Re-insert all existing keys using original hash function.
| Step | Description |
|---|---|
| 1 | Allocate new larger table |
| 2 | Recompute hash indices for each key |
| 3 | Insert keys into new table |
| 4 | Update table reference |
Performance Analysis
Load Factor Impact
Load factor α = n/m (n = keys, m = slots). Higher α increases collision probability, degrades average search time.
Expected Search Time
Chaining: O(1 + α). Open addressing: O(1/(1-α)) average under uniform hashing.
Clustering Effects
Primary clustering: contiguous blocks (linear probing). Secondary clustering: keys with same initial hash (quadratic).
Implementation Considerations
Hash Function Choice
Uniform distribution critical. Must minimize collisions. Examples: multiplicative, division, cryptographic hashes.
Table Size Selection
Prefer prime or power-of-two sizes depending on hash function. Influences probing coverage and clustering.
Deletion Handling
Chaining: straightforward removal from list. Open addressing: use special markers (tombstones) to maintain probe sequences.
Advanced Techniques
Cuckoo Hashing
Uses two hash functions and two tables. Collisions resolved by relocating existing keys. Guarantees constant worst-case lookup.
Hopscotch Hashing
Improves open addressing by maintaining neighborhoods of entries. Reduces probe counts, improves cache locality.
Perfect Hashing
Static set hashing with no collisions. Uses specialized functions and data structures. Optimal for fixed key sets.
References
- Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C. Introduction to Algorithms. MIT Press, 3rd Ed., 2009, pp. 241-254.
- Knuth, D.E. The Art of Computer Programming, Volume 3: Sorting and Searching. Addison-Wesley, 2nd Ed., 1998, pp. 527-571.
- Mitzenmacher, M., Upfal, E. Probability and Computing: Randomized Algorithms and Probabilistic Analysis. Cambridge University Press, 2005, pp. 172-189.
- Pagh, R., Rodler, F.F. Cuckoo Hashing. Journal of Algorithms, Vol. 51, No. 2, 2004, pp. 122-144.
- Dietzfelbinger, M. Universal Hashing and Its Applications. In: Workshop on Efficient and Experimental Algorithms, Springer, 1996, pp. 1-20.