How fast can we multiply two n × n square matrices?

A fundamental problem in theoretical computer science is to determine the time complexity of Matrix Multiplication, one of the most basic linear algebraic operations. Matrix multiplication plays an important role in physics, engineering, computer science, and other fields. It is used as a subroutine in many computational problems.

What is the fastest algorithm for matrix multiplication? The question typically translates to determining the exponent of matrix multiplication: the smallest real number \( \omega \) such that the product of two matrices of size n x n over a field F can be determined using \( n^{\omega} \) operations over F. Many works has been invested in making matrix multiplication algorithms efficient over the years, but the bound is still between \(2 \leq \omega \leq 3 \).

Problem: Matrix Multiplication

Input: Two matrices of size n x n, A and B.

Output: An n × n matrix C where C[i][j] is the dot product of the ith row of A

and the jth column of B.

Naive Approach (Iterative)

The elementary algorithm for matrix multiplication can be implemented as a tight product of three nested loops:

By analyzing the time complexity of this algorithm, we get the number of multiplications M(n,n,n) given by the following summation:

\begin{equation*}M(n,n,n) = \sum_{i=1}^{n}\sum_{j=1}^{n}\sum_{k=1}^{n}1\end{equation*}

Sums get evaluated from the right inward. The sum of n ones is n, so

\begin{equation*}M(n,n,n) = \sum_{i=1}^{n}\sum_{j=1}^{n}n\end{equation*}

The sum of n ns is just as simple, \(n \times n = n^{2} \), so

\begin{equation*}M(n,n,n) = \sum_{i=1}^{n}n^{2}\end{equation*}

Finally, the sum of \(n^{2} \)s is \(n \times n^{2}= n^{3} \).

Thus the running time of this square matrix multiplication algorithm is \(O(n^{3}) \). This algorithm looks so natural and trivial that it is very hard to imagine that there is a better way to multiply matrices. However, there are more efficient algorithms for matrix multiplication than the naive approach.

Divide-And-Conquer Approach

It turns out that Matrix multiplication is easy to break into subproblems because it can be performed blockwise. To see what this means, carve A into four n/2 × n/2 blocks, and also square matrices B and C :

Then their product can be expressed in terms of these blocks and is exactly as if the blocks were single elements

It is the same as

From this, we have obtained 8 smaller matrix multiplications and 4 additions. Armed with these equations, now we devise a simple, recursive algorithm:

Let’s investigate this recursive version of the matrix multiplication algorithm. In the base case, when the number of rows is equal to 1, the algorithm performs just one scalar multiplication. It is obvious.

In the recursive case, total time is computed as the sum of the partitioning time ( dividing matrices into 4 parts), the time for all the recursive call, and the time to add the matrices resulting from the recursive calls:

\begin{equation*}T(n)=\Theta(1)+8T(\frac{n}{2})+\Theta(n^{2}) = 8T(\frac{n}{2})+\Theta(n^{2})\end{equation*}

Using the Master Theorem we derive

\begin{equation*}T(n)=\Theta(n^{\log_{2}{8}})=\Theta(n^3)\end{equation*}

Surprise! Where is the improvement? Until the late 1960s, it was believed that computing the product C of n x n matrices requires essentially a cubic number of operations, as the fastest known algorithm was the naive algorithm which indeed runs in \(O(n^{3}) \) time.

Strassen’s Algorithm

In 1969, Volker Strassen came up with an algorithm whose asymptotic bound beat cubic. Using just a bit of algebra, he was able to reduce the number of sub-calls to matrix-multiplies to 7. He used the following factoring scheme. He wrote down \(C_{ij} \)s in terms of block matrices \(P_{k} \)s. Each \(P_{k} \) could be calculated simply from products and sums of sub-blocks of A and B.

That is,



It is important to notice that each of the above factors can be evaluated using exactly one matrix multiplication. By adding and subtracting submatrices P, it can be verified that :

Oh, God! The decomposition into submatrices is so tricky and intricate that I wonder how Strassen was ever able to discover it!

Strassen traded off one matrix multiplication for a constant number of matrix additions, thus got a lower asymptotic running time. With 7 recursive calls and the combining cost \( \Theta(n^{2}) \), the performance of Strassen’s Algorithm was:

\begin{equation*}T(n) = 7T(\frac{n}{2})+\Theta(n^{2})\end{equation*}

Again the Master Theorem gives us

\begin{equation*}T(n)=\Theta(n^{\log_{2}{7}})=\Theta(n^{2.8})\end{equation*}.

Evolution of ω Towards 2

Strassen’s’ amazing discovery spawned a long line of research which gradually reduced the matrix multiplication exponent ω over time. For almost a decade after his discovery, no further progress was made.

In 1978, Pan found explicit algorithms to further reduce ω with a technique called as trilinear aggregation. Using this technique, Pan showed that we can accelerate multiplication of two 70 × 70 matrices in 143640 operations. The result was ω ≤ \(log_{70} 143640 \)< 2.79.

In 1979, Bini et al. presented that the number of operations required to perform a matrix multiplication could be reduced by considering bilinear approximate algorithms. Using this method (which is similar to trilinear aggregation), he obtained ω ≤ 2.78.

In 1981, Schonhage developed a sophisticated theory involving the bilinear complexity of rectangular matrix multiplication that showed that approximate bilinear algorithms are even more powerful. By his asymptotic sum inequality theorem, he proved ω < 2.55. This was a huge improvement over the previous bound of Bini et al.

In the following years, Romani, Pan, Coppersmith and Winograd made further improvements and achieved ω < 2.50.

In 1986 Strassen introduced his laser method which yields ω ≤ 2.48.

In 1989, Coppersmith and Winograd used this method to great effect to provide the “world record” for ω< 2.38. The Coppersmith Winograd algorithm had been the world champion for 20 years until finally being improved by Stothers in 2010. Independently, Vassilevska-Williams obtained a further improvement in 2012, and Le Gall perfected their methods to obtain the current world champion in 2014, ω< 2.3728639.

Really incredible stuff!

Nevertheless, due to the large constant factors in the running time, all algorithms with better asymptotic running time than Strassen’s algorithm are impractical.

In practice, the Strassen algorithm outperforms the simple cubic algorithm when the dimension n gets larger than 100. This is not the exact “crossover point” since its value is highly system dependent. D’Alberto and Nicolau have written a very efficient adaptive matrix multiplication code, which switches from Strassen’s to the cubic algorithm at the optimal point.

Thus, the naive algorithm will likely be your best bet unless your matrices are very large.

So, what do you think?

Will we ever achieve the hypothesized \( O(n^{2}) \)?

Resources:

Introduction to Algorithms, 3rd Edition (The MIT Press) by Thomas H. Cormen (Author), Charles E. Leiserson (Author), Ronald L. Rivest (Author), Clifford Stein (Author) The Algorithm Design Manual 2nd ed. 2008 Edition by Steven S Skiena (Author) CME 323: Distributed Algorithms and Optimization, Spring 2016 Instructor: Reza Zadeh, Matriod and Stanford Algebraic Complexity Theory By Peter Bürgisser, Michael Clausen, Mohammad On the Complexity of Matrix Multiplication, Andrew James Stothers



If you like my post, please share it with your friends, and also read my last post about how to generate “random” numbers with computers.

My next post is about Matrix Chain Multiplication.

Like this: Like Loading...