Over the past (too many) days I've been preparing my presentation for the ACCU 2009 conference. At one point I wanted to show how loop tiling increases locality of reference and therefore cache hits. Surprisingly, I could not find a demo on the web, so I built one from scratch. Here are two applets demonstrating memory accesses during a matrix raise to the power of two operation.

Normal Loop

const int N = 12 ; double a [ N ][ N ], r [ N ][ N ]; for ( int i = 0 ; i < N ; i ++ ) for ( int j = 0 ; j < N ; j ++ ) { r [ i ][ j ] = 0 ; for ( int k = 0 ; k < N ; k ++ ) r [ i ][ j ] += a [ i ][ k ] * a [ k ][ j ]; }

This browser does not have a Java Plug-in.

Get the latest Java Plug-in here.

Loop with Tiling

const int N = 12 ; const int TILE = 4 ; double a [ N ][ N ], r [ N ][ N ]; for ( int i = 0 ; i < N ; i += TILE ) for ( int j = 0 ; j < N ; j += TILE ) for ( int k = 0 ; k < N ; k += TILE ) for ( int ii = i ; ii < i + TILE ; ii ++ ) for ( int jj = j ; jj < j + TILE ; jj ++ ) { r [ ii ][ jj ] = 0 ; for ( int kk = k ; kk < k + TILE ; kk ++ ) r [ ii ][ jj ] += a [ ii ][ kk ] * a [ kk ][ jj ]; }

This browser does not have a Java Plug-in.

Get the latest Java Plug-in here.

I elected to write the code in Processing; you can download it from this link. Adopting Processing proved to be a good choice; getting the application up and running was easy, although converting the multiplication loops to work inside Processing's frame loop was a bit tricky. Even better, by adding a few more lines of code I converted the result into a movie. Here are the magic incantations.

import processing.video.MovieMaker ; // [...] MovieMaker mm = new MovieMaker ( this , width , height , "tiledemo.mov" , 30 , MovieMaker . ANIMATION , MovieMaker . LOSSLESS ); // [...] mm . addFrame (); // [...] mm . finish ();

Converting the movie into a format I could embed into Powerpoint proved to be more difficult. After (again, too) many experiments I found that converting the movie using the VLC Media Player "Convert/Save" function with ASF encapsulation and the WMV2 codec at 2048bps gave me perfect results.

Note: the original version of this blog entry contained a matrix by a vector multiplication. The first comments refer to the original version.