Introduction

The Subset Sum problem is a fundamental problem in computer science and combinatorial optimization. Given a set of integers and a target value, the task is to determine whether any subset of the given integers sums exactly to the target. This problem is one of Karp's 21 NP-complete problems and serves as the basis for many cryptographic systems.

Despite its simple formulation, the Subset Sum problem captures the essential difficulty of many real-world optimization challenges. It appears in resource allocation, financial planning, bin packing, and scheduling. The problem also forms the foundation of several public-key cryptosystems, including the Merkle-Hellman knapsack cryptosystem.

Two main algorithmic approaches exist: backtracking with pruning (suitable for moderate-sized inputs with effective constraints) and dynamic programming (efficient when the target sum is not too large). Both approaches have their own advantages depending on the specific instance characteristics.

Problem Definition

Given a set S = {s1, s2, ..., sn} of n positive integers and a target value T, determine whether there exists a subset S' of S such that the elements of S' sum to exactly T.

Decision vs. Search Versions

VersionOutputExample
DecisionYes or NoS={3,7,1,8,4}, T=11 -> Yes
SearchThe subset itselfS={3,7,1,8,4}, T=11 -> {3,8} or {7,4}
CountingNumber of subsetsS={3,7,1,8,4}, T=11 -> 2
OptimizationClosest achievable sumS={3,7,1,8,4}, T=10 -> {3,7} sums to 10

The Subset Sum problem is a special case of the 0/1 Knapsack problem where each item's weight equals its value. It is also equivalent to the Partition problem when T equals half the total sum of all elements.

Backtracking Solution

The backtracking approach uses the inclusion/exclusion principle. At each step, we decide whether to include the current element in the subset or exclude it. This creates a binary decision tree with 2^n leaves.

Pseudocode

procedure SubsetSum(set, n, index, currentSum, target): if currentSum == target: return true if index == n: return false // Include set[index] if SubsetSum(set, n, index + 1, currentSum + set[index], target): return true // Exclude set[index] if SubsetSum(set, n, index + 1, currentSum, target): return true return false

This algorithm explores two branches at each element: one where the element is included and one where it is excluded. Without pruning, it examines all 2^n subsets in the worst case.

Pruning Techniques

Effective pruning can dramatically reduce the search space. The most important pruning strategies are:

Sum Exceeds Target (Feasibility Pruning)

If the current partial sum already exceeds the target (assuming all elements are positive), no further inclusions can lead to a valid solution. The branch is cut immediately.

Remaining Sum Insufficient (Optimality Pruning)

If the current partial sum plus the sum of all remaining elements is less than the target, no solution exists in this branch. This requires precomputing suffix sums.

Sorting Optimization

Sorting the set in descending order before starting the search tends to trigger pruning earlier, since large elements quickly push the partial sum above the target or quickly exhaust the remaining budget.

procedure SubsetSumPruned(set, n, index, currentSum, target, remainingSum): if currentSum == target: return true if index == n: return false // Pruning: current sum exceeds target if currentSum + set[index] > target: return false // Pruning: remaining elements cannot reach target if currentSum + remainingSum < target: return false // Include set[index] if SubsetSumPruned(set, n, index + 1, currentSum + set[index], target, remainingSum - set[index]): return true // Exclude set[index] if SubsetSumPruned(set, n, index + 1, currentSum, target, remainingSum - set[index]): return true return false

Worked Example

Find a subset of S = {3, 7, 1, 8, 4} that sums to T = 11.

Sorted descending: {8, 7, 4, 3, 1}, total = 23

Step 1: Index 0, element 8. Include: sum = 8, remaining = 15. Proceed.

Step 2: Index 1, element 7. Include: sum = 15 > 11. Pruned. Exclude: sum = 8, remaining = 8.

Step 3: Index 2, element 4. Include: sum = 12 > 11. Pruned. Exclude: sum = 8, remaining = 4.

Step 4: Index 3, element 3. Include: sum = 11 = target. Solution found: {8, 3}.

With pruning, we explored only 4 nodes instead of the full 2^5 = 32 nodes. The feasibility pruning at steps 2 and 3 cut two large subtrees entirely.

In practice, well-implemented pruning reduces the average search space from 2^n to a much smaller tree, though the worst case remains exponential. The effectiveness of pruning depends heavily on the distribution of element values relative to the target.

Complexity Analysis

ApproachTimeSpaceWhen to Use
Backtracking (no pruning)O(2^n)O(n)Small n, all subsets needed
Backtracking (with pruning)O(2^n) worst caseO(n)Moderate n with tight target
Dynamic programmingO(n * T)O(T)Large n, small target T
Meet in the middleO(2^(n/2) * n)O(2^(n/2))n up to ~40, large T

Dynamic Programming Approach

When the target sum T is not too large, a pseudo-polynomial dynamic programming solution is efficient. We construct a boolean table dp where dp[j] indicates whether a subset summing to j exists using elements considered so far.

procedure SubsetSumDP(set, n, target): dp = boolean array of size target + 1, all false dp[0] = true for i from 0 to n-1: for j from target down to set[i]: if dp[j - set[i]]: dp[j] = true return dp[target]

This approach runs in O(n * T) time and O(T) space. It is called pseudo-polynomial because the running time depends on the numeric value of T, not just the input size. For problems where T is exponential in n, this approach offers no advantage over backtracking.

NP-Completeness

The Subset Sum problem was one of the 21 problems shown to be NP-complete by Richard Karp in 1972. A problem is NP-complete if it is in NP (solutions can be verified in polynomial time) and every problem in NP can be reduced to it in polynomial time.

The NP-completeness of Subset Sum means that no polynomial-time algorithm is known for solving all instances. However, this does not mean every instance is hard. Many practical instances can be solved efficiently by dynamic programming or pruned backtracking. The hard instances tend to have element values that are exponential in n and targets near half the total sum.

The Subset Sum problem is weakly NP-complete, meaning it admits a pseudo-polynomial time algorithm (DP). In contrast, strongly NP-complete problems like 3-SAT remain hard even when all numbers in the input are polynomially bounded.

Variants and Applications

  • Partition Problem: Determine whether a set can be divided into two subsets with equal sums. This is equivalent to Subset Sum with T = totalSum / 2.
  • Multi-way Partition: Divide a set into k subsets with equal sums. This generalization is strongly NP-complete for k >= 3.
  • Subset Sum with Negative Numbers: When elements can be negative, the problem requires a shifted DP table or the meet-in-the-middle approach.
  • Approximate Subset Sum: Find a subset whose sum is within a factor (1-epsilon) of the target. Ibarra and Kim (1975) gave an FPTAS for this variant.
  • Cryptography: The Merkle-Hellman cryptosystem is based on the difficulty of Subset Sum. Although this specific system was broken by Shamir, lattice-based variants remain an active area of post-quantum cryptography research.

References

  • Karp, R. M. "Reducibility Among Combinatorial Problems." Complexity of Computer Computations, Plenum Press, 1972, pp. 85-103.
  • Garey, M. R. and Johnson, D. S. "Computers and Intractability: A Guide to the Theory of NP-Completeness." W. H. Freeman, 1979.
  • Horowitz, E. and Sahni, S. "Computing Partitions with Applications to the Knapsack Problem." Journal of the ACM, vol. 21, 1974, pp. 277-292.
  • Ibarra, O. H. and Kim, C. E. "Fast Approximation Algorithms for the Knapsack and Sum of Subset Problems." Journal of the ACM, vol. 22, 1975, pp. 463-468.
  • Merkle, R. C. and Hellman, M. E. "Hiding Information and Signatures in Trapdoor Knapsacks." IEEE Transactions on Information Theory, vol. 24, 1978, pp. 525-530.