Porting: SIMD

The work on making the code portable continues.

Way back when I first got an i7 CPU, I wrote a global illumination ray tracer that made really nice pictures while pushing the CPU to its limits. The math code for the ray tracer used the CPU’s SIMD instructions to perform the math operations more quickly, and I got very used to coding with them. The performance gain using them was fairly large.

For Banished I wanted to get started writing code and the game quickly – so I tended to use things I knew very well – everything from the choice of compiler, modeling tools, algorithms, and APIs. For math code (which is used by pretty much anything in the game that moves, animates, or is displayed in 3D) I chose to use the SIMD intrinsics.

That stands for Single Instruction, Multiple Data – meaning the instructions operate on more than one value at a time.

For example the single instruction:

c = _mm_add_ps(a, b);

would be the same as doing 4 additions:

c.x = a.x + b.x; c.y = a.y + b.y; c.z = a.z + b.z; c.w = a.w + b.w;

If all goes well, I should be able to compile on OSX and Linux using the same SIMD instructions, as I’ll still be targeting Intel/AMD CPUs. But just in case it doesn’t work or compile, or more importantly, if I port to a platform without those intrinsics, I’d like the code to be ready.

In writing the SIMD code, I didn’t keep any sort of portable C++ version around. Which is not so good. Having the C code for reference material is nice. Especially for something that’s been optimized to be fast and is no longer easily readable – or some equation that was hard to derive and that scrap of paper got thrown away…

The porting process is pretty easy. Most SIMD code just performs 4 operations at once, so in plain C++ code, all the operations are generally written out explicitly.

The C++ code I wrote is now much easier to look at. But I’ve also left the code open enough that if another platform has it’s own set of differing SIMD instructions I can write code using them if so desired on that platform, or I can just use the C++ code.

The worst part of porting the math code like this is typo’s. After all, there’s not too many different ways to write a matrix multiply, matrix inverse, or quaternion interpolate. So any errors tend to be errors in typing. You write .x, .y, .z, or .w (or similar) so many times that they tend to just blur together. You look at the code so many times knowing what it should be, that you don’t see the glaring error.

So after writing the C++ code, everything appeared to work correctly, except I noticed some deer would twitch during their animations.

It took me more time debugging to find the two errors that caused this than the time it took to port the entirety of the math library to generic C++.

In one place I had mistakenly written

float d = (a._v._0 * b._v._0) + (a._v._1 * b._v._1) + (a._v._2 * b._v._2) + (b._v._3 * b._v._3);

instead of

float d = (a._v._0 * b._v._0) + (a._v._1 * b._v._1) + (a._v._2 * b._v._2) + (a._v._3 * b._v._3);

And in the other location

Vector3((_r._0 * a._r._0) + (_r._1 * a._r._0) + (_r._2 * a._u._0), (_r._0 * a._r._1) + (_r._1 * a._f._1) + (_r._2 * a._u._1), (_r._0 * a._r._2) + (_r._1 * a._f._2) + (_r._2 * a._u._2)),

instead of

Vector3((_r._0 * a._r._0) + (_r._1 * a._f._0) + (_r._2 * a._u._0), (_r._0 * a._r._1) + (_r._1 * a._f._1) + (_r._2 * a._u._1), (_r._0 * a._r._2) + (_r._1 * a._f._2) + (_r._2 * a._u._2)),

Hopefully you can spot the errors faster than I did – it’s easier without additional math code above and below. I guess my brain saw what I expected to see, rather than what it really saw.

I’m actually fairly amazed the game displayed mostly correctly with these errors – the first error was part of Quaterion::Interpolate – which is used on every object displayed in game every frame, as well as during animation. The second error was in the code to multiply one orthonormal basis by another. Also used very heavily.

And for those of you thinking I should have unit tests for this kinda of thing, yeah math is definitely a system that can be unit tested – but eh, unit tests, that’s a discussion for another day.

Anyway, lots of debugging later the math code is now portable and the game works properly as far as I can tell with identical output.

Next up is an OpenGL version of the renderer, which I’ve got half done… more to come later.