Activity Selection (Greedy Algorithms)

Introduction

The Activity Selection problem is the canonical example of a greedy algorithm. Given a set of activities, each with a start time and finish time, the goal is to select the maximum number of mutually compatible activities -- that is, activities that do not overlap in time. This problem is the foundation for many scheduling algorithms used in operating systems, resource management, and operations research.

The greedy strategy for this problem is remarkably simple: always select the activity that finishes earliest among those compatible with the already-selected activities. This locally optimal choice leads to a globally optimal solution, a property that distinguishes greedy problems from those requiring dynamic programming or exhaustive search.

Problem Definition

Given n activities, each described by a start time s_i and finish time f_i (where s_i < f_i), select the largest subset S of activities such that no two activities in S overlap. Two activities i and j are compatible if f_i <= s_j or f_j <= s_i (one finishes before the other starts).

Example Input

Activity	Start Time	Finish Time
A1	1	4
A2	3	5
A3	0	6
A4	5	7
A5	3	9
A6	5	9
A7	6	10
A8	8	11
A9	8	12
A10	12	14

Greedy Approach

The algorithm sorts activities by finish time and greedily selects compatible activities:

Pseudocode

procedure ActivitySelection(activities): sort activities by finish time ascending selected = [activities[0]] lastFinish = activities[0].finish for i from 1 to n-1: if activities[i].start >= lastFinish: selected.add(activities[i]) lastFinish = activities[i].finish return selected

Why Sort by Finish Time?

Selecting the activity that finishes earliest leaves the maximum remaining time for subsequent activities. Other greedy strategies (sort by start time, sort by duration, sort by number of conflicts) do not always produce optimal solutions.

The activity selection problem demonstrates the greedy choice property: an optimal solution exists that includes the activity with the earliest finish time. This can be proved by an exchange argument, showing that any optimal solution not including this activity can be modified to include it without reducing the total count.

Proof of Optimality

The greedy algorithm is optimal, meaning it always finds a maximum-size set of compatible activities. The proof uses an exchange argument:

Let G = {g1, g2, ..., gk} be the greedy solution (sorted by finish time).
Let O = {o1, o2, ..., om} be any optimal solution (sorted by finish time), with m >= k.
Claim: for each i, f(gi) <= f(oi). Proof by induction: g1 has the earliest finish time among all activities, so f(g1) <= f(o1). For i > 1, since f(gi-1) <= f(oi-1) <= s(oi), the greedy algorithm could have chosen oi but instead chose gi because gi finishes no later.
If m > k, then activity o(k+1) starts after f(ok) >= f(gk), meaning it is compatible with the greedy solution. But the greedy algorithm would have selected it, contradicting that it stopped at k activities.
Therefore m = k, and the greedy solution is optimal.

Worked Example

Using the activities from the table above, sorted by finish time: A1(1,4), A2(3,5), A3(0,6), A4(5,7), A5(3,9), A6(5,9), A7(6,10), A8(8,11), A9(8,12), A10(12,14).

Step 1: Select A1 (finish=4). lastFinish = 4.

Step 2: A2 starts at 3 < 4. Skip.

Step 3: A3 starts at 0 < 4. Skip.

Step 4: A4 starts at 5 >= 4. Select A4 (finish=7). lastFinish = 7.

Step 5: A5 starts at 3 < 7. Skip.

Step 6: A6 starts at 5 < 7. Skip.

Step 7: A7 starts at 6 < 7. Skip.

Step 8: A8 starts at 8 >= 7. Select A8 (finish=11). lastFinish = 11.

Step 9: A9 starts at 8 < 11. Skip.

Step 10: A10 starts at 12 >= 11. Select A10 (finish=14).

Result: {A1, A4, A8, A10} -- 4 activities selected. This is the maximum possible.

Complexity Analysis

Operation	Time
Sorting by finish time	O(n log n)
Greedy selection scan	O(n)
Total	O(n log n)
Space	O(1) extra (beyond input)

If the activities are already sorted by finish time (as is common in some applications), the algorithm runs in O(n) time.

Implementation

def activity_selection(activities): # activities is a list of (start, finish) tuples activities.sort(key=lambda x: x[1]) selected = [activities[0]] last_finish = activities[0][1] for i in range(1, len(activities)): if activities[i][0] >= last_finish: selected.append(activities[i]) last_finish = activities[i][1] return selected# Exampleactivities = [(1,4), (3,5), (0,6), (5,7), (3,9), (5,9), (6,10), (8,11), (8,12), (12,14)]result = activity_selection(activities)print("Selected:", result)# Output: Selected: [(1,4), (5,7), (8,11), (12,14)]

Weighted Activity Selection

In the weighted variant, each activity has a profit (weight), and the goal is to maximize total profit rather than the number of activities. The greedy approach no longer works for this variant; dynamic programming is required.

procedure WeightedActivitySelection(activities): sort activities by finish time dp[0] = activities[0].weight for i from 1 to n-1: // Option 1: exclude activity i exclude = dp[i-1] // Option 2: include activity i j = latest activity that finishes before activities[i].start include = activities[i].weight + (dp[j] if j >= 0 else 0) dp[i] = max(exclude, include) return dp[n-1]

The weighted variant runs in O(n log n) time using binary search to find the latest compatible activity.

The unweighted activity selection problem is one of the few scheduling problems where a greedy algorithm is provably optimal. Most generalizations (weighted, multiple resources, preemptive) require dynamic programming or more sophisticated techniques.

Applications

Job Scheduling: Scheduling jobs on a single machine to maximize the number of completed tasks.
Room Assignment: Assigning meetings to conference rooms to minimize the number of rooms needed (interval graph coloring).
Television Broadcasting: Selecting the maximum number of non-overlapping programs to broadcast on a single channel.
Resource Allocation: Assigning shared resources (machines, vehicles, classrooms) to requests that arrive with time windows.
Interval Scheduling Maximization: The activity selection problem is the core of interval scheduling theory, with extensions to multiple machines, weighted intervals, and online variants.

References

Cormen, T. H., Leiserson, C. E., Rivest, R. L., and Stein, C. "Introduction to Algorithms." MIT Press, Chapter 16, 2009.
Kleinberg, J. and Tardos, E. "Algorithm Design." Addison-Wesley, Chapter 4, 2005.
Kolen, A. W. J., Lenstra, J. K., Papadimitriou, C. H., and Spieksma, F. C. R. "Interval Scheduling: A Survey." Naval Research Logistics, vol. 54, 2007, pp. 530-543.
Gavril, F. "Algorithms for Minimum Coloring, Maximum Clique, Minimum Covering by Cliques, and Maximum Independent Set of a Chordal Graph." SIAM Journal on Computing, vol. 1, 1972, pp. 180-187.