In linear algebra, the Strassen algorithm, named for Volker Strassen, is an algorithm for matrix multiplication. It is faster than the standard matrix multiplication algorithm and is useful in practice for large arrays, but it would be slower than the fastest algorithms known for extremely large arrays.

The Strassen algorithm works for any ring, such as plus / multiply, but not for all semi-trailers, such as min-plus or Boolean algebra, where the naive algorithm still works and what is known as combinatorial matrix multiplication.





Strassen’s Algorithm

The most mind-blowing fact of this algorithm is that someone attempted it.

Let me show you what I mean.

First, we need to know about matrix multiplication. Let's write down two 3-by-3 matrices:

Let's call the left matrix A A and the right B B . We define matrix multiplication between these two as a calculation which results in another matrix with the same numbers of rows as A A and the same number of columns as B B . The definition dictates we calculate each element of the result as:









Let's call the resulting matrix C C . So if you'd like to calculate an element of C C that's in the second row and first column, you first select out the second row from A A and first column of B B . Then you multiply each element of these together and sum them up. This is similarly defined for all other elements of C C . The result is:

To understand why Strassen's Algorithm is interesting, we need to count how many simple add/multiply calculations are involved in producing C C . Well, that's not hard. To produce that number 48, we had 3 multiplications and 2 additions, for a total of 5 simple calcs. Since there are 9 elements in C C , we have a total of 9 × 5 = 45 9 × 5 = 45 calculations.

To make progress, we need to get more general. Let's say A A and B B are n n -by- n n matrices[1]. Then the formula for the total number of calculations is:

T ( n ) = n 2 ( n + ( n − 1 ) ) = 2 n 3 − n 2 T ( n ) = n 2 ( n + ( n − 1 ) ) = 2 n 3 − n 2

Now, when some mathy people look at this, they want to drop the specifics and consider only the dominate behavior as n n gets large. In other words, they care about the generic speed with which this function grows. For this case, they would say T ( n ) T ( n ) grows in proportion to n 3 n 3 . More technically, they say:

T(n) is in the set O ( n 3 ) O ( n 3 )

which means you could choose a constant c c such that T ( n ) < c ⋅ n 3 T ( n ) < c ⋅ n 3 for all n n beyond a certain point. Just think of this as a rigorous way of pointing out the highest exponent term in T ( n ) T ( n ) .





So let's think about that statement: T ( n ) T ( n ) grows in proportion to n 3 n 3 . It has to, right? The result, C C , has n 2 n 2 elements and we have to calculate each in turn. For each turn, we have some near-multiple of n n calculations to do. Therefore, this must amount in something on the order of n 3 n 3 calculations... right?

No.

Strassen's algorithm was the first strike to chip away at the n 3 n 3 shell, showing us that we could get the cost of matrix multiplication to grow at a rate less than n 3 n 3 .

That is absolutely wild. How the hell are we avoiding the work of matrix multiplication that seems baked into its definition?

Volker Strassen O ( n 2.8 ) O ( n 2.8 ) . This sparked a flurry of research, and we've since made significant progress: I have no clue and no one suspected it was worth an attempt untilcame along. His algorithm showed it could be done with a cost near. This sparked a flurry of research, and we've since made significant progress:

I’ll say it again, that is absolutely wild.

Footnotes