Who would have thought that such a small article on a Matrix Optimization and its impact on execution time would generate that many comments in such a short time, thanks for all of them! The original point I was trying to make in that article was to always check, if an optimization actually improves matters before you employ it. But I guess most of you found the question about why it was going wrong more interesting to discuss. And since I don’t want to disappoint my readers (thats you :P), I decided to put some of your suggestions to the test and see what actually caused the slowdown on the Intel Compiler.

The first thing that came to my mind when I saw the code was the additional branch and how it may be causing problems. As many readers have suggested, one can avoid it like this:

for (i = 0; i < N; i++) { A[i][0] = 2 * i + 1; for (j = 1; j < N; j++) { A[i][j] = A[i][j - 1] + 3; } } [/c] Who would have thought, this gives a small but noticeable improvement. But the program is still about 20 times slower than the original one on the Intel compiler (all these numbers are just rough measurements, of course). So let's see if by using a temporary variable we can improve performance: [c] for (i = 0; i < N; i++) { for (j = 0; j < N; j++) { if (j == 0) { temp = 2 * i + 1; A[i][j] = temp; } else { temp += 3; A[i][j] = temp; } } } [/c] The if-branch is still left in there to stick to one change per program. This program is about as fast as the original one. Wow. Who would have thought that todays compilers still have issues with register allocation. If my students had presented me with this program first, I would also have had issues with it, because they had obfuscated a perfectly fine loop-nest for an optimization that I thought every compiler did today. I guess I was wrong. P.S.: I am looking for a solution to the code disappearing in comments-problem that bit many of you, but have not found anything yet. If you have an idea how to solve this and allow people to post code (including