Results

Which of these memory layouts is fastest?

The last time I asked this question, my answer was the following: The answer is complicated, and it seems to depend on the data size, the cache size, the cache line width, and the relative cache speed. Since then I've been working with Paul Khuong, and we've come to the conclusion that the answer is not really that complicated: The Eytzinger layout offers the best all-around performance over a wide range of array lengths. To understand why, you can read the paper.

Spoiler: It has to do do with hiding latency by overusing bandwidth. (Hacker News user derf_9 has a nice summary.)

Most of the data we've collected so far supports our hypothesis, but it's always nice to have more data. If you have a Linux machine and would like to contribute to this effort, then you can either try to reproduce our results or you can send us more data. Instructions for both are below.