If you need to store objects in a regular grid, then you've got pretty much two options: boost::multi_array has a convenient interface and its code is efficient. But the current trend in hardware design towards longer and longer vector ALUs means that your computational kernels will either need to scatter/gather memory accesses (slow!) or your code will be scalar only.

Therefore the general opinion in HPC is to leave objects out of performance critical code and use a Struct of Arrays memory layout. This means that instead of storing a single, huge array of objects (structs), you'll allocate multiple arrays of scalar types -- one per member. The advantage is that all corresponding variables will be stored consecutively in memory, which makes your hardware happy. But you probably won't be as happy, as this is a clumsy and counterintuitive way of programming (not object-oriented). But even worse, your code may now run into a new problem: address generation. Did you ever wonder why constructs like A[z * DIM_X * DIM_Y + y * DIM_X + x] don't slow down your code? After all, one could assume that this code would result in three multiplications and two additions. In reality your compiler will try to keep a pointer to each row of consecutive elements that your innermost loop will access. This pointer will ideally be stored in a register. Trouble strikes if the number of pointers required exceeds the number of available registers. In that case these pointers may be spilled to the stack. That's awful as that means that an array access may now result in two memory accesses: one to retrieve the pointer, one to move the actual data. Doesn't apply to you? Well, that happens for models as simple as a Lattice Boltzmann method with a D3Q19 model.