Simulated Annealing | What's Your IQ

Introduction

Simulated annealing: randomized optimization algorithm inspired by metallurgical annealing. Heat metal, slowly cool to low-energy crystalline structure (global minimum). Algorithm: maintain temperature, probabilistically accept worse solutions, cool over time. Escape local optima via noise, converge to global.

Core idea: avoid being trapped in local optima. Early high temperature: accept many worse moves (explore). Low temperature: mostly accept improving moves (exploit). Balance exploration and exploitation dynamically.

Advantages: escapes local optima better than hill climbing, asymptotically guaranteed to find global optimum. Disadvantages: slower than hill climbing, convergence rate unknown, parameters critical.

"Simulated annealing harnesses randomness to transcend local optima. Inspired by nature's cooling processes, enables finding high-quality solutions to intractable optimization problems." -- Metaheuristic optimization methods

Physical Annealing Metaphor

Metallurgical Annealing

Annealing: heat metal to high temperature, then slowly cool. Allows atoms to rearrange, escape local energy minima. Results in low-energy crystalline structure (global minimum). Fast cooling: freezes atoms in local minima (suboptimal). Slow cooling: approaches global minimum.

Thermal Energy

High temperature: atoms vibrate violently, move freely. Bonds break, rearrange. Low temperature: atoms settle, form stable bonds. Temperature controls mobility.

Boltzmann Distribution

Probability of state at temperature T: P(E) ∝ exp(-E/kT). High T: wide distribution (many states possible). Low T: narrow distribution (lowest-energy states probable). Physics governs transition probabilities.

Energy Landscape Analogy

Energy: objective function value. Low energy: good solution. High energy: bad solution. Physical system seeks global minimum. Optimization seeks minimum (or maximum for maximization). Analogy guides algorithm design.

Cooling Schedule

Cooling: temperature decreases over time. Determines annealing quality. Fast cool: system doesn't reach global minimum. Slow cool: takes forever but approaches global minimum. Schedule crucial.

Basic Algorithm

Pseudocode

def simulated_annealing(initial_solution):
 current = initial_solution
 temperature = initial_temperature

 while temperature > final_temperature:
 neighbor = generate_random_neighbor(current)
 delta_E = objective(neighbor) - objective(current)

 if delta_E < 0: # Neighbor better
 current = neighbor
 else: # Neighbor worse
 if random() < exp(-delta_E / temperature):
 current = neighbor # Accept worse move

 temperature = cool_down(temperature)

 return best_solution_found

Step-by-Step Process

1. Start with initial solution, high temperature. 2. Generate random neighbor (random move). 3. Evaluate difference. 4. If improves, always move. 5. If worsens, move probabilistically (depends on temperature). 6. Cool temperature. 7. Repeat until temperature near zero.

Random Neighbor Generation

Neighbor: achievable via single move. Random: ensures exploration (not biased toward improvement). Examples: flip random bit, swap random cities, perturb parameter.

Best Solution Tracking

Track best solution found during search (not necessarily current solution). Final answer: best ever found, not necessarily final state.

Randomness Role

Randomness: avoids deterministic hill-climbing. Accepts bad moves early (escape). Decreases acceptance over time. Enables escaping local optima while converging to good solutions.

Acceptance Probability

Metropolis Criterion

Accept move with probability: P(accept) = min(1, exp(-ΔE / T)). ΔE: change in cost. T: temperature. If ΔE < 0 (improves): always accept (P=1). If ΔE > 0 (worsens): accept with probability depending on magnitude and temperature.

High Temperature Behavior

T high: exp(-ΔE / T) ≈ 1, even bad moves accepted. Random walk: explore broadly, ignore cost. Escapes local optima easily. Inefficient: wastes moves on bad solutions.

Low Temperature Behavior

T low: exp(-ΔE / T) ≈ 0 for large ΔE. Only improving moves accepted. Hill-climbing-like. Efficient: focuses on good region. Risk: traps in local optimum.

Transition Example

ΔE = 10 (move worsens by 10)
T = 100: P = exp(-10/100) ≈ 0.90 (likely accept)
T = 10: P = exp(-10/10) ≈ 0.37 (sometimes accept)
T = 1: P = exp(-10/1) ≈ 0.00001 (rarely accept)

Parameter Sensitivity

Large ΔE: hard to accept (worse move = lower probability). Small ΔE: easier to accept. Temperature more important. T=0: deterministic, no randomness. T=∞: pure randomness.

Temperature Schedule

Schedule Role

Temperature schedule: function T(t) determining temperature at time t. Controls acceptance probability over algorithm lifetime. Too fast: local optimum. Too slow: convergence time huge. Balance crucial.

Linear Schedule

T(t) = T_0 - α*t. Decreases linearly with time. Simple, easy to understand. Risk: decreases too fast (gets stuck), or too slow (inefficient).

Geometric Schedule

T(t) = T_0 * r^t. Decreases exponentially (multiply by r < 1 each iteration). Often more effective: fast initial cooling, slow late cooling. Example: r = 0.95 reduces temperature 5% per iteration.

Logarithmic Schedule

T(t) = T_0 / log(t). Theoretically optimal for convergence to global optimum. Proven convergence: with logarithmic cooling, approaches global minimum. But slow convergence (logarithmic decrement).

Adaptive Schedules

Dynamic: adjust temperature based on acceptance rate. Increase if acceptance dropping (getting stuck). Decrease if too many accepts (exploring poorly). Self-tuning, responsive.

Practical Schedules

Often geometric with fixed cooling rate. Example: T = 0.95 * T, repeat. Balances theory and practicality. Fine-tune α, r for problem.

Cooling Rate and Strategies

Cooling Rate Definition

Cooling rate: how fast temperature decreases. Rate = r in geometric schedule (or α in linear). Critical parameter: affects convergence speed, solution quality.

Fast Cooling

r close to 1 (e.g., 0.99). Quickly approaches hill-climbing. Pro: fast convergence, focuses on good regions. Con: trapped in local optima, lower quality.

Slow Cooling

r lower (e.g., 0.9). More time exploring. Pro: finds better solutions, less likely trapped. Con: slow convergence, many evaluations.

Temperature Thresholds

Stop when T below threshold (e.g., 0.001). Or fixed iterations. Determine computation budget. Time/quality trade-off.

Annealing Phases

Different cooling phases: fast cool initially (quick local refinement), slow cool late (thorough exploration). Multi-phase schedules adapt.

Reheating

Occasional reheating: increase T temporarily. Re-explores with fresh perspective. Escape "stuck" states. Advanced strategy.

Parameter Tuning and Sensitivity

Key Parameters

Initial temperature T_0: affects early acceptance. Too low: misses exploration. Too high: wasteful. Cooling rate (r or α): affects convergence. Final temperature: when to stop.

Initial Temperature Selection

Too low: hill-climbing behavior from start, no escape. Too high: random walk, no convergence. Rule of thumb: accept ~80% of initial moves. Estimate from sample moves.

Cooling Rate Selection

Empirical: test values (0.90-0.99). Slower (0.90) better quality, slower. Faster (0.99) quicker, lower quality. Problem-dependent. Start conservative, refine.

Sensitivity Analysis

Small parameter changes: large effect on convergence. Nonlinear sensitivity. No universal good values. Require problem-specific tuning.

Empirical Tuning

Run multiple instances with different parameters. Plot convergence: temperature vs. objective. Choose parameters balancing speed/quality for goals.

Self-Adaptive Methods

Automatic parameter adjustment: metaheuristics tune SA parameters. Or run multiple SAs in parallel, share good parameter settings. Reduce manual tuning burden.

Convergence Properties and Theory

Theoretical Guarantees

With logarithmic cooling schedule: SA converges to global optimum with probability 1 (as time → ∞). Asymptomatic: eventual convergence, but timing unknown.

Convergence Proof Sketch

Markov chain: random walk in solution space. Logarithmic cooling: stationary distribution concentrates on global optima. As T → 0, probability of global optimum → 1.

Practical Convergence

Theoretical convergence slow (logarithmic). Practical: geometric cooling faster but no guarantee. Trade-off theory for speed.

Convergence Speed

Depends on: problem structure (landscape ruggedness), cooling rate, initial temperature. Empirical: measure iterations to good solution. Varies widely.

No-Free-Lunch Theorem

No universal best algorithm. SA excellent some problems, poor others. Landscape structure matters. Problem-specific analysis needed.

Comparison with Random Search

SA beats random search: biased toward good regions. Beats hill climbing: escapes local optima. Complementary strengths.

Improvements and Variants

Parallel Simulated Annealing

Run multiple SA instances in parallel, different starting points. Periodically exchange best solutions. Combines multiple search paths. Better coverage, faster effective convergence.

Adaptive Simulated Annealing

ASA: adjust temperature based on acceptance rate. If stuck (low acceptance), increase temperature. If exploring (high acceptance), decrease. Self-tuning, responsive.

Hybrid Approaches

Combine SA with local search: SA finds promising region, local search refines. SA with genetic algorithm: mutate population via SA, crossover. Synergistic benefits.

Problem-Specific Modifications

Adjust neighbor generation: biased toward promising neighbors. Custom acceptance probabilities: domain knowledge. Encode problem structure into algorithm.

Multi-Objective Simulated Annealing

Multiple objectives: maintain Pareto frontier (non-dominated solutions). SA balances objectives probabilistically. Find diverse, high-quality trade-offs.

Quantum Annealing

Recent: quantum devices perform annealing. Tunneling through energy barriers (unlike classical). Potential speedup. Practical quantum computers limited, experimental.

Comparison with Other Algorithms

SA vs. Hill Climbing

Aspect	Hill Climbing	Simulated Annealing
Speed	Fast	Slower
Local optima	Trapped easily	Can escape
Solution quality	Often suboptimal	Often better
Parameters	Few	Many (tuning required)

SA vs. Genetic Algorithms

SA single-point search (current solution). GA population-based. SA simpler, lighter memory. GA more parallel, diverse exploration. SA faster, GA often better quality (with tuning).

SA vs. Tabu Search

SA probabilistic acceptance. Tabu deterministic with forbidden list. SA simpler, less memory. Tabu more sophisticated, often better. Complementary approaches.

Real-World Applications

Traveling Salesman Problem

Find short tour visiting all cities. SA very competitive. Temperature: initially accept tour 10% longer, cool to only improving. Good tours found quickly.

Circuit Design

Chip design: place circuit components minimize area, maximize speed. Objective: minimize total distance, avoid congestion. SA solves large instances efficiently.

Scheduling Problems

Job scheduling, staff scheduling, airline crew scheduling. Constraints: skills, availability, fairness. SA finds feasible, near-optimal schedules.

Protein Folding

Computational biology: find minimum-energy protein configuration. SA explores vast conformation space, finds low-energy structures (approximately folded proteins).

Image Processing

Image restoration, segmentation. Objective: pixel assignment minimizing energy. SA iteratively improves pixel assignments, converges to coherent images.

Derivative-Free Optimization

Objectives without gradients (simulation-based, black-box). SA requires only function evaluations, no derivatives. Perfect for complex, non-differentiable objectives.

References

Kirkpatrick, S., Gelatt, C. D., and Vecchi, M. P. "Optimization by Simulated Annealing." Science, vol. 220, no. 4598, 1983, pp. 671-680.
Metropolis, N., Rosenbluth, A. W., Rosenbluth, M. N., Teller, A. H., and Teller, E. "Equation of State Calculations by Fast Computing Machines." The Journal of Chemical Physics, vol. 21, 1953, pp. 1087-1092.
Aarts, E., and Korst, J. "Simulated Annealing and Boltzmann Machines: A Stochastic Approach to Combinatorial Optimization and Neural Computing." John Wiley & Sons, 1989.
Michalewicz, Z., and Fogel, D. B. "How to Solve It: Modern Heuristics." Springer, 2nd edition, 2004.
Ingber, L. "Simulated Annealing: Practice Versus Theory." Mathematical and Computer Modelling, vol. 18, no. 11, 1993, pp. 29-57.