Introduction
Simulated annealing: randomized optimization algorithm inspired by metallurgical annealing. Heat metal, slowly cool to low-energy crystalline structure (global minimum). Algorithm: maintain temperature, probabilistically accept worse solutions, cool over time. Escape local optima via noise, converge to global.
Core idea: avoid being trapped in local optima. Early high temperature: accept many worse moves (explore). Low temperature: mostly accept improving moves (exploit). Balance exploration and exploitation dynamically.
Advantages: escapes local optima better than hill climbing, asymptotically guaranteed to find global optimum. Disadvantages: slower than hill climbing, convergence rate unknown, parameters critical.
"Simulated annealing harnesses randomness to transcend local optima. Inspired by nature's cooling processes, enables finding high-quality solutions to intractable optimization problems." -- Metaheuristic optimization methods
Physical Annealing Metaphor
Metallurgical Annealing
Annealing: heat metal to high temperature, then slowly cool. Allows atoms to rearrange, escape local energy minima. Results in low-energy crystalline structure (global minimum). Fast cooling: freezes atoms in local minima (suboptimal). Slow cooling: approaches global minimum.
Thermal Energy
High temperature: atoms vibrate violently, move freely. Bonds break, rearrange. Low temperature: atoms settle, form stable bonds. Temperature controls mobility.
Boltzmann Distribution
Probability of state at temperature T: P(E) ∝ exp(-E/kT). High T: wide distribution (many states possible). Low T: narrow distribution (lowest-energy states probable). Physics governs transition probabilities.
Energy Landscape Analogy
Energy: objective function value. Low energy: good solution. High energy: bad solution. Physical system seeks global minimum. Optimization seeks minimum (or maximum for maximization). Analogy guides algorithm design.
Cooling Schedule
Cooling: temperature decreases over time. Determines annealing quality. Fast cool: system doesn't reach global minimum. Slow cool: takes forever but approaches global minimum. Schedule crucial.
Basic Algorithm
Pseudocode
def simulated_annealing(initial_solution):
current = initial_solution
temperature = initial_temperature
while temperature > final_temperature:
neighbor = generate_random_neighbor(current)
delta_E = objective(neighbor) - objective(current)
if delta_E < 0: # Neighbor better
current = neighbor
else: # Neighbor worse
if random() < exp(-delta_E / temperature):
current = neighbor # Accept worse move
temperature = cool_down(temperature)
return best_solution_found
Step-by-Step Process
1. Start with initial solution, high temperature. 2. Generate random neighbor (random move). 3. Evaluate difference. 4. If improves, always move. 5. If worsens, move probabilistically (depends on temperature). 6. Cool temperature. 7. Repeat until temperature near zero.
Random Neighbor Generation
Neighbor: achievable via single move. Random: ensures exploration (not biased toward improvement). Examples: flip random bit, swap random cities, perturb parameter.
Best Solution Tracking
Track best solution found during search (not necessarily current solution). Final answer: best ever found, not necessarily final state.
Randomness Role
Randomness: avoids deterministic hill-climbing. Accepts bad moves early (escape). Decreases acceptance over time. Enables escaping local optima while converging to good solutions.
Acceptance Probability
Metropolis Criterion
Accept move with probability: P(accept) = min(1, exp(-ΔE / T)). ΔE: change in cost. T: temperature. If ΔE < 0 (improves): always accept (P=1). If ΔE > 0 (worsens): accept with probability depending on magnitude and temperature.
High Temperature Behavior
T high: exp(-ΔE / T) ≈ 1, even bad moves accepted. Random walk: explore broadly, ignore cost. Escapes local optima easily. Inefficient: wastes moves on bad solutions.
Low Temperature Behavior
T low: exp(-ΔE / T) ≈ 0 for large ΔE. Only improving moves accepted. Hill-climbing-like. Efficient: focuses on good region. Risk: traps in local optimum.
Transition Example
ΔE = 10 (move worsens by 10)
T = 100: P = exp(-10/100) ≈ 0.90 (likely accept)
T = 10: P = exp(-10/10) ≈ 0.37 (sometimes accept)
T = 1: P = exp(-10/1) ≈ 0.00001 (rarely accept)
Parameter Sensitivity
Large ΔE: hard to accept (worse move = lower probability). Small ΔE: easier to accept. Temperature more important. T=0: deterministic, no randomness. T=∞: pure randomness.
Temperature Schedule
Schedule Role
Temperature schedule: function T(t) determining temperature at time t. Controls acceptance probability over algorithm lifetime. Too fast: local optimum. Too slow: convergence time huge. Balance crucial.
Linear Schedule
T(t) = T_0 - α*t. Decreases linearly with time. Simple, easy to understand. Risk: decreases too fast (gets stuck), or too slow (inefficient).
Geometric Schedule
T(t) = T_0 * r^t. Decreases exponentially (multiply by r < 1 each iteration). Often more effective: fast initial cooling, slow late cooling. Example: r = 0.95 reduces temperature 5% per iteration.
Logarithmic Schedule
T(t) = T_0 / log(t). Theoretically optimal for convergence to global optimum. Proven convergence: with logarithmic cooling, approaches global minimum. But slow convergence (logarithmic decrement).
Adaptive Schedules
Dynamic: adjust temperature based on acceptance rate. Increase if acceptance dropping (getting stuck). Decrease if too many accepts (exploring poorly). Self-tuning, responsive.
Practical Schedules
Often geometric with fixed cooling rate. Example: T = 0.95 * T, repeat. Balances theory and practicality. Fine-tune α, r for problem.
Cooling Rate and Strategies
Cooling Rate Definition
Cooling rate: how fast temperature decreases. Rate = r in geometric schedule (or α in linear). Critical parameter: affects convergence speed, solution quality.
Fast Cooling
r close to 1 (e.g., 0.99). Quickly approaches hill-climbing. Pro: fast convergence, focuses on good regions. Con: trapped in local optima, lower quality.
Slow Cooling
r lower (e.g., 0.9). More time exploring. Pro: finds better solutions, less likely trapped. Con: slow convergence, many evaluations.
Temperature Thresholds
Stop when T below threshold (e.g., 0.001). Or fixed iterations. Determine computation budget. Time/quality trade-off.
Annealing Phases
Different cooling phases: fast cool initially (quick local refinement), slow cool late (thorough exploration). Multi-phase schedules adapt.
Reheating
Occasional reheating: increase T temporarily. Re-explores with fresh perspective. Escape "stuck" states. Advanced strategy.
Parameter Tuning and Sensitivity
Key Parameters
Initial temperature T_0: affects early acceptance. Too low: misses exploration. Too high: wasteful. Cooling rate (r or α): affects convergence. Final temperature: when to stop.
Initial Temperature Selection
Too low: hill-climbing behavior from start, no escape. Too high: random walk, no convergence. Rule of thumb: accept ~80% of initial moves. Estimate from sample moves.
Cooling Rate Selection
Empirical: test values (0.90-0.99). Slower (0.90) better quality, slower. Faster (0.99) quicker, lower quality. Problem-dependent. Start conservative, refine.
Sensitivity Analysis
Small parameter changes: large effect on convergence. Nonlinear sensitivity. No universal good values. Require problem-specific tuning.
Empirical Tuning
Run multiple instances with different parameters. Plot convergence: temperature vs. objective. Choose parameters balancing speed/quality for goals.
Self-Adaptive Methods
Automatic parameter adjustment: metaheuristics tune SA parameters. Or run multiple SAs in parallel, share good parameter settings. Reduce manual tuning burden.
Convergence Properties and Theory
Theoretical Guarantees
With logarithmic cooling schedule: SA converges to global optimum with probability 1 (as time → ∞). Asymptomatic: eventual convergence, but timing unknown.
Convergence Proof Sketch
Markov chain: random walk in solution space. Logarithmic cooling: stationary distribution concentrates on global optima. As T → 0, probability of global optimum → 1.
Practical Convergence
Theoretical convergence slow (logarithmic). Practical: geometric cooling faster but no guarantee. Trade-off theory for speed.
Convergence Speed
Depends on: problem structure (landscape ruggedness), cooling rate, initial temperature. Empirical: measure iterations to good solution. Varies widely.
No-Free-Lunch Theorem
No universal best algorithm. SA excellent some problems, poor others. Landscape structure matters. Problem-specific analysis needed.
Comparison with Random Search
SA beats random search: biased toward good regions. Beats hill climbing: escapes local optima. Complementary strengths.
Improvements and Variants
Parallel Simulated Annealing
Run multiple SA instances in parallel, different starting points. Periodically exchange best solutions. Combines multiple search paths. Better coverage, faster effective convergence.
Adaptive Simulated Annealing
ASA: adjust temperature based on acceptance rate. If stuck (low acceptance), increase temperature. If exploring (high acceptance), decrease. Self-tuning, responsive.
Hybrid Approaches
Combine SA with local search: SA finds promising region, local search refines. SA with genetic algorithm: mutate population via SA, crossover. Synergistic benefits.
Problem-Specific Modifications
Adjust neighbor generation: biased toward promising neighbors. Custom acceptance probabilities: domain knowledge. Encode problem structure into algorithm.
Multi-Objective Simulated Annealing
Multiple objectives: maintain Pareto frontier (non-dominated solutions). SA balances objectives probabilistically. Find diverse, high-quality trade-offs.
Quantum Annealing
Recent: quantum devices perform annealing. Tunneling through energy barriers (unlike classical). Potential speedup. Practical quantum computers limited, experimental.
Comparison with Other Algorithms
SA vs. Hill Climbing
| Aspect | Hill Climbing | Simulated Annealing |
|---|---|---|
| Speed | Fast | Slower |
| Local optima | Trapped easily | Can escape |
| Solution quality | Often suboptimal | Often better |
| Parameters | Few | Many (tuning required) |
SA vs. Genetic Algorithms
SA single-point search (current solution). GA population-based. SA simpler, lighter memory. GA more parallel, diverse exploration. SA faster, GA often better quality (with tuning).
SA vs. Tabu Search
SA probabilistic acceptance. Tabu deterministic with forbidden list. SA simpler, less memory. Tabu more sophisticated, often better. Complementary approaches.
Real-World Applications
Traveling Salesman Problem
Find short tour visiting all cities. SA very competitive. Temperature: initially accept tour 10% longer, cool to only improving. Good tours found quickly.
Circuit Design
Chip design: place circuit components minimize area, maximize speed. Objective: minimize total distance, avoid congestion. SA solves large instances efficiently.
Scheduling Problems
Job scheduling, staff scheduling, airline crew scheduling. Constraints: skills, availability, fairness. SA finds feasible, near-optimal schedules.
Protein Folding
Computational biology: find minimum-energy protein configuration. SA explores vast conformation space, finds low-energy structures (approximately folded proteins).
Image Processing
Image restoration, segmentation. Objective: pixel assignment minimizing energy. SA iteratively improves pixel assignments, converges to coherent images.
Derivative-Free Optimization
Objectives without gradients (simulation-based, black-box). SA requires only function evaluations, no derivatives. Perfect for complex, non-differentiable objectives.
References
- Kirkpatrick, S., Gelatt, C. D., and Vecchi, M. P. "Optimization by Simulated Annealing." Science, vol. 220, no. 4598, 1983, pp. 671-680.
- Metropolis, N., Rosenbluth, A. W., Rosenbluth, M. N., Teller, A. H., and Teller, E. "Equation of State Calculations by Fast Computing Machines." The Journal of Chemical Physics, vol. 21, 1953, pp. 1087-1092.
- Aarts, E., and Korst, J. "Simulated Annealing and Boltzmann Machines: A Stochastic Approach to Combinatorial Optimization and Neural Computing." John Wiley & Sons, 1989.
- Michalewicz, Z., and Fogel, D. B. "How to Solve It: Modern Heuristics." Springer, 2nd edition, 2004.
- Ingber, L. "Simulated Annealing: Practice Versus Theory." Mathematical and Computer Modelling, vol. 18, no. 11, 1993, pp. 29-57.