What are vectorized instructions anyway?

Vectorized instructions, often called SIMD instructions (single instruction multiple data) are instructions that leverage special registers in modern CPUs to speedup certain types of operations. These registers can hold a series of values next to each other, so that a single operation can be executed on all of them at the same time.

One important thing to note is that the size of a SIMD register depends on the specific processor, and that it also influences the number of values it can hold at any time. To be precise, there are different SIMD instructions that work on different SIMD registers, each of a specific size, but the C# APIs we will use provide a nice abstraction over this and expose these registers to us as a vector type that can vary in size depending on the specific device we are executing our code on. For instance, it we have 128 bits registers at our disposal, we will be able to store either 2 double values (64 bits each), or 4 int or float values (32 bits each), 8 ushort or short values (16 bits each) or 16 byte or sbyte values (8 bits each). SIMD registers might also not be available at all, so we should always manually check whether that’s the case before executing our vectorized code.

Suppose we had two int[] arrays and we wanted to write to another int[] array of the same size the sum of each pair of values. Instead of going through each element one by one, we could load a chunk of consecutive values from each array into two separate SIMD registers, sum those registers together in a single instruction, and then copy that resulting register to the right location into the int[] array to return from the method. The type we need is Vector<T>, which represents a SIMD register with elements of a given type. Here is a sample with both the traditional implementation of this algorithm using a for loop, and one using the Vector and Vector<T> APIs:

With this proof of concept, we can now move on to an implementation of our original Count method that leverages the power of SIMD instructions.