Overview

Definition

Skip lists: probabilistic data structures supporting efficient search, insertion, and deletion in ordered lists. Invented by William Pugh in 1989 as an alternative to balanced trees. Use multiple forward pointer layers to "skip" over elements, achieving logarithmic average time complexity.

Purpose

Enable fast ordered operations on dynamic data sets. Maintain sorted order with minimal restructuring. Offer simple implementation compared to complex balanced trees.

Historical Context

Introduced to address complexity in balanced trees. Probabilistic balancing eliminates rigid restructuring. Popular in databases, memory indexing, and concurrent applications.

"Skip lists provide a simple and efficient alternative to balanced trees, combining probabilistic methods with linked lists." -- William Pugh

Structure

Basic Components

Nodes: store key-value pairs and multiple forward pointers. Levels: layered linked lists, where level 0 contains all elements, higher levels contain fewer nodes.

Levels and Layers

Each node assigned random level on insertion. Level i list links nodes with level ≥ i. Top levels have fewer nodes, enabling long jumps.

Head Node

Special sentinel node with maximum level. Acts as entry point for all searches and traversals.

Pointers and Links

Forward pointers per level point to next node on that level. No backward pointers in basic skip lists.

ComponentDescription
NodeStores key, value, array of forward pointers
LevelIndex indicating node’s height in the list
HeadSentinel node with maximum level
Forward PointerReference to next node at given level

Search Algorithm

Basic Idea

Traverse levels top-down, moving forward while next node key less than search key. Drop down one level if no further forward progress.

Step-by-Step Process

Initialize current node at head top level. While level ≥ 0: advance forward as long as next node key < key. If next node key equals search key, return node. Else drop level and repeat. If reach level -1, key not found.

Algorithmic Pseudocode

function search(key): current = head for level from maxLevel down to 0: while current.forward[level] != null and current.forward[level].key < key: current = current.forward[level] current = current.forward[0] if current != null and current.key == key: return current else: return null

Insertion

Overview

Insert new node with random level. Update forward pointers to include new node while preserving order.

Finding Insertion Points

Maintain update array to store last nodes before insertion point on each level during search.

Random Level Generation

Use geometric distribution with probability p (commonly 0.5) to determine node level. Max level capped.

Insertion Steps

Generate level. Create new node. For each level up to node level, update forward pointers: new node forwards to update[i].forward[i], update[i].forward[i] points to new node.

function insert(key, value): update[0..maxLevel] = head current = head for level from maxLevel down to 0: while current.forward[level] != null and current.forward[level].key < key: current = current.forward[level] update[level] = current newLevel = randomLevel() if newLevel > maxLevel: for i in maxLevel+1 to newLevel: update[i] = head maxLevel = newLevel newNode = createNode(key, value, newLevel) for i in 0 to newLevel: newNode.forward[i] = update[i].forward[i] update[i].forward[i] = newNode

Deletion

Overview

Remove node by adjusting forward pointers to bypass target node. Maintain structure integrity.

Locating Node

Use update array to track nodes before target on each level as in insertion/search.

Pointer Reassignment

For each level i where target node exists, set update[i].forward[i] to target.forward[i].

Level Adjustment

Reduce max level if highest levels become empty after deletion.

function delete(key): update[0..maxLevel] = null current = head for level from maxLevel down to 0: while current.forward[level] != null and current.forward[level].key < key: current = current.forward[level] update[level] = current current = current.forward[0] if current != null and current.key == key: for i in 0 to maxLevel: if update[i].forward[i] != current: break update[i].forward[i] = current.forward[i] while maxLevel > 0 and head.forward[maxLevel] == null: maxLevel = maxLevel - 1

Complexity Analysis

Search Complexity

Average O(log n) due to layered structure and probabilistic jumps. Worst-case O(n) rare and improbable.

Insertion Complexity

Average O(log n): search for insertion point plus pointer updates. Random level limits height.

Deletion Complexity

Average O(log n): similar to insertion, updating pointers and adjusting levels.

Space Complexity

O(n) with additional pointers overhead: each node stores multiple forward pointers proportional to level.

OperationAverage TimeWorst Time
SearchO(log n)O(n)
InsertionO(log n)O(n)
DeletionO(log n)O(n)

Probabilistic Balancing

Random Level Assignment

Node levels determined by repeated coin flips with probability p. Higher levels exponentially less likely.

Balancing Effect

Ensures average logarithmic height without deterministic rebalancing. Avoids costly rotations/restructuring.

Mathematical Model

Probability(node level ≥ k) = p^k, commonly p=0.5. Expected number of nodes at level k: n * p^k.

function randomLevel(): level = 0 while random() < p and level < MAX_LEVEL: level = level + 1 return level

Parameter Tuning

Probability p affects height and pointer overhead. Typical p = 0.5 balances speed and space.

Comparison with Other Structures

Balanced Binary Search Trees (AVL, Red-Black)

Similar average complexities. Skip lists simpler to implement, probabilistic rather than strict balancing.

Arrays and Linked Lists

Skip lists outperform in dynamic ordered sets. Arrays support binary search but costly insertions/deletions. Linked lists slow for search.

Hash Tables

Skip lists maintain order; hash tables do not. Skip lists suitable for range queries and ordered traversal.

Trie Structures

Tries specialize in prefix search; skip lists general purpose ordered maps.

Data StructureAverage SearchInsertion/DeletionOrderingComplexity
Skip ListO(log n)O(log n)MaintainedProbabilistic
AVL TreeO(log n)O(log n)MaintainedDeterministic
Hash TableO(1)O(1)Not OrderedDeterministic
Linked ListO(n)O(1)MaintainedDeterministic

Applications

Databases and Indexing

Used in in-memory indexing, range queries, and ordered key-value stores. Examples: Redis, LevelDB.

Network Routing

Efficient lookup in routing tables and distributed hash tables (DHTs) in peer-to-peer networks.

Concurrent Data Structures

Skip lists adapted for lock-free concurrent implementations due to simple pointer updates.

Memory Management

Managing free lists and allocations in dynamic memory allocators.

Algorithmic Foundations

Base for advanced data structures and randomized algorithms.

Advantages and Limitations

Advantages

Simple implementation. Fast average operations. Dynamic and flexible. Space-efficient relative to balanced trees. Easy concurrency adaptation.

Limitations

Worst-case linear time possible. Randomness introduces non-determinism. Higher space overhead than simple linked lists due to multiple pointers. Performance sensitive to parameter p.

Mitigation Strategies

Hybrid deterministic-probabilistic variants. Tuning probability and max level. Periodic rebalancing if required.

Implementation Details

Node Structure

Key, value, array of forward pointers sized by node level. Optional backward pointers for bidirectional traversal.

Memory Management

Dynamic allocation per node. Preallocation possible for high-performance needs.

Random Number Generator

Quality affects level distribution. Use uniform RNG with probability threshold.

Concurrency Considerations

Lock-based: coarse or fine-grained locks per level or node. Lock-free: atomic pointer updates with CAS operations.

Parameter Settings

Typical max level: 16 or 32. Probability p: 0.25 to 0.5. Adjust based on data size and memory constraints.

References

  • Pugh, William. "Skip Lists: A Probabilistic Alternative to Balanced Trees." Communications of the ACM, vol. 33, no. 6, 1990, pp. 668-676.
  • Fraser, Keir. "Practical Lock-Free Linked Lists." Proceedings of the 20th ACM Symposium on Principles of Distributed Computing, 2001, pp. 70-79.
  • Morin, Pat. "Open Data Structures." 2008. Chapter on Skip Lists. Available online.
  • Mehlhorn, Kurt, and Peter Sanders. "Algorithms and Data Structures: The Basic Toolbox." Springer, 2008.