Overview
Definition
Skip lists: probabilistic data structures supporting efficient search, insertion, and deletion in ordered lists. Invented by William Pugh in 1989 as an alternative to balanced trees. Use multiple forward pointer layers to "skip" over elements, achieving logarithmic average time complexity.
Purpose
Enable fast ordered operations on dynamic data sets. Maintain sorted order with minimal restructuring. Offer simple implementation compared to complex balanced trees.
Historical Context
Introduced to address complexity in balanced trees. Probabilistic balancing eliminates rigid restructuring. Popular in databases, memory indexing, and concurrent applications.
"Skip lists provide a simple and efficient alternative to balanced trees, combining probabilistic methods with linked lists." -- William Pugh
Structure
Basic Components
Nodes: store key-value pairs and multiple forward pointers. Levels: layered linked lists, where level 0 contains all elements, higher levels contain fewer nodes.
Levels and Layers
Each node assigned random level on insertion. Level i list links nodes with level ≥ i. Top levels have fewer nodes, enabling long jumps.
Head Node
Special sentinel node with maximum level. Acts as entry point for all searches and traversals.
Pointers and Links
Forward pointers per level point to next node on that level. No backward pointers in basic skip lists.
| Component | Description |
|---|---|
| Node | Stores key, value, array of forward pointers |
| Level | Index indicating node’s height in the list |
| Head | Sentinel node with maximum level |
| Forward Pointer | Reference to next node at given level |
Search Algorithm
Basic Idea
Traverse levels top-down, moving forward while next node key less than search key. Drop down one level if no further forward progress.
Step-by-Step Process
Initialize current node at head top level. While level ≥ 0: advance forward as long as next node key < key. If next node key equals search key, return node. Else drop level and repeat. If reach level -1, key not found.
Algorithmic Pseudocode
function search(key): current = head for level from maxLevel down to 0: while current.forward[level] != null and current.forward[level].key < key: current = current.forward[level] current = current.forward[0] if current != null and current.key == key: return current else: return nullInsertion
Overview
Insert new node with random level. Update forward pointers to include new node while preserving order.
Finding Insertion Points
Maintain update array to store last nodes before insertion point on each level during search.
Random Level Generation
Use geometric distribution with probability p (commonly 0.5) to determine node level. Max level capped.
Insertion Steps
Generate level. Create new node. For each level up to node level, update forward pointers: new node forwards to update[i].forward[i], update[i].forward[i] points to new node.
function insert(key, value): update[0..maxLevel] = head current = head for level from maxLevel down to 0: while current.forward[level] != null and current.forward[level].key < key: current = current.forward[level] update[level] = current newLevel = randomLevel() if newLevel > maxLevel: for i in maxLevel+1 to newLevel: update[i] = head maxLevel = newLevel newNode = createNode(key, value, newLevel) for i in 0 to newLevel: newNode.forward[i] = update[i].forward[i] update[i].forward[i] = newNodeDeletion
Overview
Remove node by adjusting forward pointers to bypass target node. Maintain structure integrity.
Locating Node
Use update array to track nodes before target on each level as in insertion/search.
Pointer Reassignment
For each level i where target node exists, set update[i].forward[i] to target.forward[i].
Level Adjustment
Reduce max level if highest levels become empty after deletion.
function delete(key): update[0..maxLevel] = null current = head for level from maxLevel down to 0: while current.forward[level] != null and current.forward[level].key < key: current = current.forward[level] update[level] = current current = current.forward[0] if current != null and current.key == key: for i in 0 to maxLevel: if update[i].forward[i] != current: break update[i].forward[i] = current.forward[i] while maxLevel > 0 and head.forward[maxLevel] == null: maxLevel = maxLevel - 1Complexity Analysis
Search Complexity
Average O(log n) due to layered structure and probabilistic jumps. Worst-case O(n) rare and improbable.
Insertion Complexity
Average O(log n): search for insertion point plus pointer updates. Random level limits height.
Deletion Complexity
Average O(log n): similar to insertion, updating pointers and adjusting levels.
Space Complexity
O(n) with additional pointers overhead: each node stores multiple forward pointers proportional to level.
| Operation | Average Time | Worst Time |
|---|---|---|
| Search | O(log n) | O(n) |
| Insertion | O(log n) | O(n) |
| Deletion | O(log n) | O(n) |
Probabilistic Balancing
Random Level Assignment
Node levels determined by repeated coin flips with probability p. Higher levels exponentially less likely.
Balancing Effect
Ensures average logarithmic height without deterministic rebalancing. Avoids costly rotations/restructuring.
Mathematical Model
Probability(node level ≥ k) = p^k, commonly p=0.5. Expected number of nodes at level k: n * p^k.
function randomLevel(): level = 0 while random() < p and level < MAX_LEVEL: level = level + 1 return levelParameter Tuning
Probability p affects height and pointer overhead. Typical p = 0.5 balances speed and space.
Comparison with Other Structures
Balanced Binary Search Trees (AVL, Red-Black)
Similar average complexities. Skip lists simpler to implement, probabilistic rather than strict balancing.
Arrays and Linked Lists
Skip lists outperform in dynamic ordered sets. Arrays support binary search but costly insertions/deletions. Linked lists slow for search.
Hash Tables
Skip lists maintain order; hash tables do not. Skip lists suitable for range queries and ordered traversal.
Trie Structures
Tries specialize in prefix search; skip lists general purpose ordered maps.
| Data Structure | Average Search | Insertion/Deletion | Ordering | Complexity |
|---|---|---|---|---|
| Skip List | O(log n) | O(log n) | Maintained | Probabilistic |
| AVL Tree | O(log n) | O(log n) | Maintained | Deterministic |
| Hash Table | O(1) | O(1) | Not Ordered | Deterministic |
| Linked List | O(n) | O(1) | Maintained | Deterministic |
Applications
Databases and Indexing
Used in in-memory indexing, range queries, and ordered key-value stores. Examples: Redis, LevelDB.
Network Routing
Efficient lookup in routing tables and distributed hash tables (DHTs) in peer-to-peer networks.
Concurrent Data Structures
Skip lists adapted for lock-free concurrent implementations due to simple pointer updates.
Memory Management
Managing free lists and allocations in dynamic memory allocators.
Algorithmic Foundations
Base for advanced data structures and randomized algorithms.
Advantages and Limitations
Advantages
Simple implementation. Fast average operations. Dynamic and flexible. Space-efficient relative to balanced trees. Easy concurrency adaptation.
Limitations
Worst-case linear time possible. Randomness introduces non-determinism. Higher space overhead than simple linked lists due to multiple pointers. Performance sensitive to parameter p.
Mitigation Strategies
Hybrid deterministic-probabilistic variants. Tuning probability and max level. Periodic rebalancing if required.
Implementation Details
Node Structure
Key, value, array of forward pointers sized by node level. Optional backward pointers for bidirectional traversal.
Memory Management
Dynamic allocation per node. Preallocation possible for high-performance needs.
Random Number Generator
Quality affects level distribution. Use uniform RNG with probability threshold.
Concurrency Considerations
Lock-based: coarse or fine-grained locks per level or node. Lock-free: atomic pointer updates with CAS operations.
Parameter Settings
Typical max level: 16 or 32. Probability p: 0.25 to 0.5. Adjust based on data size and memory constraints.
References
- Pugh, William. "Skip Lists: A Probabilistic Alternative to Balanced Trees." Communications of the ACM, vol. 33, no. 6, 1990, pp. 668-676.
- Fraser, Keir. "Practical Lock-Free Linked Lists." Proceedings of the 20th ACM Symposium on Principles of Distributed Computing, 2001, pp. 70-79.
- Morin, Pat. "Open Data Structures." 2008. Chapter on Skip Lists. Available online.
- Mehlhorn, Kurt, and Peter Sanders. "Algorithms and Data Structures: The Basic Toolbox." Springer, 2008.