Hill Climbing | What's Your IQ

Introduction

Hill climbing: simple local search algorithm for optimization. Start at arbitrary point in solution space. Repeatedly move toward neighboring solution that improves objective function (increases value or decreases cost). Continue until reach local optimum (no neighbor better). Fast, memory-efficient, but gets stuck in local optima.

Metaphor: standing on hillside in fog. Can't see far ahead. Always step uphill (improve value). Eventually reach hilltop (local maximum). But might be foothills, not Mount Everest (global maximum).

Core advantage: simple, often effective for well-formed problems. Disadvantage: no guarantee finds global optimum. Used when complete/optimal solutions unnecessary. Practical for large search spaces where exhaustive search infeasible.

"Hill climbing's simplicity and efficiency make it attractive for optimization. Accepting local optimality trade-off for speed, enables solving problems intractable for complete methods." -- Search and optimization algorithms

Basic Concept and Intuition

Greedy Local Search

Greedy: make locally optimal choice (move to best neighbor). Local: only consider immediate neighbors, don't look ahead. Search: traverse solution space seeking good solutions. Greedy + local = hill climbing.

Objective Function

Defined function mapping solutions to values. Goal: maximize (or minimize) function. Hill climbing iteratively improves function value. Example: find weights minimizing neural network loss.

Neighborhood Definition

Neighborhood: set of solutions reachable from current via single move. Move definition problem-specific. Example: flip single bit in binary string. Swap adjacent cities in traveling salesman. Add/remove element from set.

Moves and Transitions

Move: atomic change from one solution to neighbor. Transition: applying move. Neighborhood size affects search: large neighborhoods, many options. Small, restrictive neighborhoods, few options.

Local Optimum

Local optimum: solution no neighbor improves. Algorithm terminates here. May be global optimum (best possible), or just local (best in region).

Deterministic Nature

Pure hill climbing deterministic: given start state, always same path. Contrast randomized algorithms (randomized moves). Determinism enables reproducibility, debugging.

Algorithm and Implementation

Basic Algorithm

def hill_climbing(initial_state):
 current = initial_state

 while True:
 neighbors = get_neighbors(current)
 best_neighbor = max(neighbors, key=objective_function)

 if objective_function(best_neighbor) <= objective_function(current):
 return current # Local optimum reached

 current = best_neighbor

Step-by-Step Process

1. Start with initial solution (often random). 2. Generate all neighbors. 3. Evaluate neighbors via objective function. 4. Select best neighbor. 5. If better than current, move to neighbor. 6. Repeat until no improvement possible.

Termination Condition

Terminate when current solution better than all neighbors. No uphill move available. Called local optimum or plateau (flat region).

Implementation Efficiency

Neighbor generation: can be expensive (might have exponentially many neighbors). Evaluation: objective function may be slow. Optimization: cache evaluations, only generate promising neighbors.

Variant: Steepest Ascent

Steepest ascent: pick best neighbor (largest improvement). Contrast: first ascent picks first improving neighbor. Steepest slower (evaluate all) but often better quality.

Example: TSP

Traveling Salesman Problem:
- Current tour: [A, B, C, D] distance=100
- Neighbor (swap B,C): [A, C, B, D] distance=95 (improved)
- Move to neighbor
- Repeat until no swaps improve

Search Space Characteristics

Landscape Structure

Search space visualized as landscape: solutions mapped to heights (objective values). Hilltops: local maxima. Valleys: local minima (for minimization). Ridges: elevated plateaus. Characterize landscape shapes algorithm performance.

Unimodal Landscapes

Single peak (for maximization). Hill climbing always reaches global optimum (no false peaks). Easy case. Example: convex functions, some well-designed problems.

Multimodal Landscapes

Multiple peaks. Many local optima. Hill climbing reaches one, often not global. Hard case. Most real optimization problems multimodal.

Rugged Landscapes

Landscape jagged: neighbors have highly varied values. Hard for hill climbing: many local optima, small neighborhoods may miss improvements. Requires escape mechanisms.

Smooth Landscapes

Objective values vary gradually. Neighbors have similar values. Hill climbing works well: can follow gradient toward optima. Smooth landscapes easier to optimize.

Landscape Analysis

Characterize problem: count local optima, measure ruggedness, analyze neighborhood structure. Predicts algorithm difficulty. Smooth, unimodal: easy. Rugged, multimodal: hard.

Local Maxima Problem

Problem Definition

Hill climbing terminates at local optimum, not necessarily global. If global optimum far from local, quality sacrifice large. In multimodal landscapes, likely trap in suboptimal local maxima.

Example: Function Maximization

f(x) = -x^2 + 10*x - 5 (multimodal)
Global max at x=5, f(5)=20

Starting x=2:
- Neighbors: x=1.9, x=2.1
- f(2) ≈ 7, f(2.1) ≈ 7.1
- Move to x=2.1
- Continue until reach local max near x=5 ✓

Starting x=8:
- f(8) ≈ -21
- Neighbors: f(7.9) ≈ -20.6
- Move toward x=5 ✓

Multimodal functions can have many local maxima, not all global.

Theoretical Analysis

Probability of local optimum depends on landscape structure. Density of local optima, distribution of objective values affect probability. No general formula: problem-specific.

Practical Severity

Severity varies: some problems have few local optima (nearly global), others many. Real-world TSP: local optima usually near-optimal (few %). Neural network training: local optima less of problem than believed (high-dimensional landscapes have less structure).

Detection and Escape

Detect: termination at non-improving point. Escape: restart from different initial state, or use modified algorithm (simulated annealing, genetic algorithms).

Plateaus and Ridges

Plateau Problem

Plateau: large region where objective value constant. Hill climbing stalls: no improving neighbor. May take many steps to escape plateau. Can be worse than local maximum (no uphill direction).

Ridges

Ridge: elevated region with thin path toward summit. Moves perpendicular to ridge don't improve. Only moves along ridge improve. Hill climbing struggles: neighbors off ridge worse.

Example: Visualization

Landscape contours:
 /‾‾‾‾ (peak)
 / \
 / \ (plateau)
 / \___

Plateau: flat region, can't move uphill
Ridge: narrow elevated path, hard to follow

Algorithmic Challenges

Plateaus: algorithms move randomly (or stop prematurely). Ridges: require coordinated moves (move left, up together), but hill climbing only moves toward improving neighbors.

Solutions

Sideways moves: allow moves maintaining objective value (traverse plateau). Tabu search: forbid revisiting recent solutions (escape plateaus). Simulated annealing: probabilistically accept worse moves (escape local optima and plateaus).

Algorithm Variants

Steepest Ascent vs. First Ascent

Steepest: evaluate all neighbors, pick best. Quality good, speed slow (evaluates all). First: pick first improving neighbor. Speed fast, quality varies. Trade-off between quality and efficiency.

Stochastic Hill Climbing

Probabilistically select improving neighbors. Higher improvement = higher probability. Enables exploration, not always greedy. Better escapes local optima than deterministic.

Random Restarts

Run hill climbing multiple times from different starting points. Keep best solution. Effective for multimodal landscapes. Budget permitting, run many times.

Tabu Search

Maintain tabu list: recently visited solutions forbidden. Allows moving to worse solutions if not tabu. Escape local optima, plateaus. Short-term memory guides search.

Simulated Annealing

Probabilistically accept worse moves. Probability decreases over time (temperature cools). Explore initially, exploit later. Effective for hard landscapes.

Variable Neighborhood Search

Change neighborhood definition dynamically. Start small neighborhood, expand if stuck. Systematic escape from local optima. More sophisticated than pure hill climbing.

Random Restart Strategy

Basic Idea

Run hill climbing from multiple random starting states. Each run finds local optimum. Return best found. Leverages: different starting points reach different local optima. Some local optima better than others.

Algorithm

def hill_climbing_with_restart(num_runs):
 best_solution = None
 best_value = -infinity

 for i in range(num_runs):
 initial = random_solution()
 local_opt = hill_climbing(initial)
 value = objective_function(local_opt)

 if value > best_value:
 best_solution = local_opt
 best_value = value

 return best_solution

Effectiveness Analysis

Probability of finding good solution increases with restarts. If k local optima, probability of missing all k decent ones decreases. But diminishing returns: each additional restart less likely to find new optimum.

Restart Schedules

Fixed: predetermined number of restarts. Adaptive: monitor improvement, increase if stalling. Early stopping: stop when finding good-enough solution, or time limit exceeded.

Computational Cost

Total cost: k restarts × cost per run. If one run finds global optimum 30% of runs, need ~3.3 restarts on average to find it (geometric distribution).

Practical Application

Common in practice: simple, effective. Parameter: number of restarts. More restarts = better solution, more computation. Budget determines restarts.

Performance Analysis and Complexity

Time Complexity

Single run: O(n × k) where n = solution size, k = average neighbor evaluations. Neighbor generation O(n), evaluation O(n) typically. Multiple restarts: multiply by restart count.

Space Complexity

O(n): store current solution and neighbor evaluations. Very memory-efficient. Contrast: genetic algorithms store populations, breadth-first search stores queue.

Solution Quality

No guarantee: quality depends on landscape, starting point, luck. Empirically: often finds acceptable solutions quickly. Rare catastrophic failures (stuck trivial local optimum) unless landscape pathological.

Convergence Behavior

Converges to local optimum in finite steps (no cycling if strict improvement required). Convergence speed varies: smooth landscapes fast, rugged landscapes slow.

Success Metrics

Time to solution: how long find acceptable solution? Solution quality: how close to optimum? Robustness: variability across runs? Trade these metrics based on problem requirements.

Problem Dependency

Performance highly problem-dependent. Some problems: hill climbing optimal or near-optimal. Others: poor quality solutions. No universal best algorithm.

Comparison with Other Optimization Methods

Hill Climbing vs. Genetic Algorithms

Aspect	Hill Climbing	Genetic Algorithm
Speed	Fast, deterministic	Slower, population-based
Memory	Minimal (single solution)	Large (population)
Local optima	Easily trapped	Better escape via diversity
Solution quality	Variable, often decent	Often good, with tuning

Hill Climbing vs. Simulated Annealing

Hill climbing: deterministic, fast. Simulated annealing: probabilistic, slower but escapes local optima. Annealing better for hard landscapes, hill climbing for easy ones.

Hill Climbing vs. Gradient Descent

Gradient descent: for continuous, differentiable functions. Uses gradient information (more efficient). Hill climbing: general, works discrete/non-differentiable. No gradient needed.

Hill Climbing vs. Exhaustive Search

Exhaustive: guaranteed optimal, exponential time. Hill climbing: fast, uncertain quality. Trade off optimality for feasibility. Hill climbing practical for large spaces.

Applications in AI

Feature Selection

Select relevant features from dataset. Search space: subsets of features. Objective: maximize model accuracy. Hill climbing: iteratively add/remove features improving accuracy. Practical for thousands of features.

Robot Control

Optimize robot control parameters. Search space: parameter values. Objective: task performance (speed, accuracy, energy). Hill climbing: adjust parameters incrementally improving performance.

Traveling Salesman Problem

Find short tour visiting all cities. Search space: permutations of cities. Objective: minimize total distance. Hill climbing: local search with 2-opt, 3-opt moves (swap city orderings).

Neural Network Training

Gradient descent special case of hill climbing (continuous optimization). Steepest gradient = best neighbor. Convergence to local optimum common issue.

Configuration and Tuning

Configure software/hardware parameters. Search space: parameter combinations. Objective: performance (throughput, latency). Hill climbing: efficient for large configuration spaces.

Planning and Scheduling

Optimize schedules, plans. Search space: task orderings, resource allocations. Objective: minimize completion time, satisfy constraints. Hill climbing: efficient for practical scheduling.

References

Russell, S. J., and Norvig, P. "Artificial Intelligence: A Modern Approach." Prentice Hall, 4th edition, 2020.
Michalewicz, Z., and Fogel, D. B. "How to Solve It: Modern Heuristics." Springer, 2nd edition, 2004.
Glover, F., and Kochenberger, G. A. (eds.). "Handbook of Metaheuristics." Springer, 2nd edition, 2010.
Kirkpatrick, S., Gelatt, C. D., and Vecchi, M. P. "Optimization by Simulated Annealing." Science, vol. 220, 1983, pp. 671-680.
Dowsland, K. A., and Thompson, J. M. "Simulated Annealing." Handbook of Natural Computing, 2012, pp. 1623-1655.