Strassens Matrix Multiplication

Introduction

Matrix multiplication is one of the most fundamental operations in numerical computing, with applications in graphics, scientific simulation, machine learning, and signal processing. The naive algorithm for multiplying two n x n matrices requires O(n^3) scalar multiplications. In 1969, Volker Strassen shocked the mathematical community by showing that this could be improved to O(n^2.807) using a clever divide and conquer approach.

Strassen's algorithm works by dividing each n x n matrix into four (n/2) x (n/2) submatrices and computing the product using only 7 recursive multiplications instead of the 8 required by the straightforward block decomposition. The savings of one multiplication per level of recursion compounds exponentially across all levels, leading to an asymptotically faster algorithm.

Standard Matrix Multiplication

The standard algorithm for computing C = A * B where A and B are n x n matrices:

for i from 1 to n: for j from 1 to n: C[i][j] = 0 for k from 1 to n: C[i][j] = C[i][j] + A[i][k] * B[k][j]

This requires n^3 multiplications and n^3 - n^2 additions, giving O(n^3) time. Using a 2x2 block decomposition:

| C11 C12 | | A11 A12 | | B11 B12 || C21 C22 | = | A21 A22 | * | B21 B22 |C11 = A11*B11 + A12*B21C12 = A11*B12 + A12*B22C21 = A21*B11 + A22*B21C22 = A21*B12 + A22*B22

This block decomposition uses 8 multiplications of (n/2) x (n/2) matrices, giving the recurrence T(n) = 8T(n/2) + O(n^2), which solves to T(n) = O(n^3) -- no improvement.

Strassen's Key Insight

Strassen discovered that the four products C11, C12, C21, C22 can be computed using only 7 multiplications (at the cost of more additions). He defined seven intermediate products:

M1 = (A11 + A22) * (B11 + B22)M2 = (A21 + A22) * B11M3 = A11 * (B12 - B22)M4 = A22 * (B21 - B11)M5 = (A11 + A12) * B22M6 = (A21 - A11) * (B11 + B12)M7 = (A12 - A22) * (B21 + B22)

Then the result blocks are:

C11 = M1 + M4 - M5 + M7C12 = M3 + M5C21 = M2 + M4C22 = M1 - M2 + M3 + M6

The reduction from 8 to 7 multiplications may seem minor, but it changes the exponent of the recurrence from log2(8) = 3 to log2(7) = 2.807, a significant asymptotic improvement that compounds at every level of recursion.

The Algorithm

procedure Strassen(A, B, n): if n == 1: return A[1][1] * B[1][1] // Split A and B into quadrants A11, A12, A21, A22 = split(A) B11, B12, B21, B22 = split(B) // Compute 7 products recursively M1 = Strassen(A11 + A22, B11 + B22, n/2) M2 = Strassen(A21 + A22, B11, n/2) M3 = Strassen(A11, B12 - B22, n/2) M4 = Strassen(A22, B21 - B11, n/2) M5 = Strassen(A11 + A12, B22, n/2) M6 = Strassen(A21 - A11, B11 + B12, n/2) M7 = Strassen(A12 - A22, B21 + B22, n/2) // Combine results C11 = M1 + M4 - M5 + M7 C12 = M3 + M5 C21 = M2 + M4 C22 = M1 - M2 + M3 + M6 return combine(C11, C12, C21, C22)

Operation	Standard	Strassen
Multiplications per level	8	7
Additions per level	4	18
Recurrence	T(n) = 8T(n/2) + O(n^2)	T(n) = 7T(n/2) + O(n^2)
Result	O(n^3)	O(n^2.807)

Worked Example

Multiply two 2x2 matrices using Strassen's formulas:

A = | 1 3 | B = | 5 7 | | 2 4 | | 6 8 |A11=1, A12=3, A21=2, A22=4B11=5, B12=7, B21=6, B22=8M1 = (1+4) * (5+8) = 5 * 13 = 65M2 = (2+4) * 5 = 6 * 5 = 30M3 = 1 * (7-8) = 1 * (-1) = -1M4 = 4 * (6-5) = 4 * 1 = 4M5 = (1+3) * 8 = 4 * 8 = 32M6 = (2-1) * (5+7) = 1 * 12 = 12M7 = (3-4) * (6+8) = (-1) * 14 = -14C11 = 65 + 4 - 32 + (-14) = 23C12 = -1 + 32 = 31C21 = 30 + 4 = 34C22 = 65 - 30 + (-1) + 12 = 46C = | 23 31 | | 34 46 |

Verification: 1*5 + 3*6 = 23, 1*7 + 3*8 = 31, 2*5 + 4*6 = 34, 2*7 + 4*8 = 46. Correct.

Strassen used 7 multiplications vs. the standard 8, plus 18 additions/subtractions vs. 4.

Complexity Analysis

The recurrence is T(n) = 7T(n/2) + O(n^2). By the Master theorem:

a = 7, b = 2, f(n) = O(n^2)
log_b(a) = log_2(7) = 2.807...
Since f(n) = O(n^2) = O(n^(log_2(7) - epsilon)) for epsilon = 0.807, this is Case 1
T(n) = O(n^(log_2(7))) = O(n^2.807)

n	Standard (n^3)	Strassen (n^2.807)	Speedup
64	262,144	117,649	2.2x
1024	~1.07 billion	~282 million	3.8x
4096	~69 billion	~13 billion	5.3x

Numerical Stability

Strassen's algorithm is less numerically stable than the standard algorithm because it performs more additions and subtractions, which can introduce cancellation errors in floating-point arithmetic. The additional intermediate computations mean that rounding errors accumulate faster.

For applications requiring high numerical accuracy (e.g., scientific computing), the standard O(n^3) algorithm or algorithms specifically designed for numerical stability may be preferred despite the higher computational cost.

Higham (1990) showed that the numerical error in Strassen's algorithm grows as O(n^(log2(12))) = O(n^3.585), compared to O(n^2) for the standard algorithm. This makes Strassen's algorithm unsuitable for ill-conditioned matrices without additional error-handling measures.

Practical Considerations

Crossover Point: Due to the larger constant factor (18 additions vs. 4), Strassen's algorithm is slower than the standard method for small matrices. In practice, implementations switch to standard multiplication for matrices smaller than about 32x32 to 128x128.
Non-Power-of-Two Sizes: The algorithm assumes n is a power of 2. For other sizes, matrices must be padded with zeros, which wastes computation. Some implementations handle odd-sized splits directly.
Memory Usage: The additional temporary matrices at each recursion level increase memory requirements compared to the standard algorithm.
Cache Performance: The recursive nature of Strassen's algorithm can lead to poor cache utilization unless carefully implemented with techniques like loop tiling.
Parallelism: The 7 independent recursive multiplications can be executed in parallel, making the algorithm well-suited for multi-core processors.

Beyond Strassen

Strassen's result opened the field of algebraic complexity theory. Researchers have continued to lower the exponent of matrix multiplication:

Year	Authors	Exponent
1969	Strassen	2.807
1979	Pan	2.796
1990	Coppersmith-Winograd	2.376
2014	Le Gall	2.3728639
2024	Williams et al.	2.371552

It is conjectured that the optimal exponent is 2 (meaning matrix multiplication can be done in essentially O(n^2) time), but this remains unproven. None of the algorithms beyond Strassen are practical for real-world use due to enormous constant factors.

References

Strassen, V. "Gaussian Elimination is Not Optimal." Numerische Mathematik, vol. 13, 1969, pp. 354-356.
Coppersmith, D. and Winograd, S. "Matrix Multiplication via Arithmetic Progressions." Journal of Symbolic Computation, vol. 9, 1990, pp. 251-280.
Higham, N. J. "Exploiting Fast Matrix Multiplication Within the Level 3 BLAS." ACM Transactions on Mathematical Software, vol. 16, 1990, pp. 352-368.
Le Gall, F. "Powers of Tensors and Fast Matrix Multiplication." Proceedings of the 39th International Symposium on Symbolic and Algebraic Computation, 2014.
Cormen, T. H., Leiserson, C. E., Rivest, R. L., and Stein, C. "Introduction to Algorithms." MIT Press, Chapter 4, 2009.