The conversion algorithm works in a loop, in each iteration exactly one input vector is loaded from memory and processed.

Since SIMD procedures impose specific layout of digits within a vector, an arbitrary input has to be properly aligned. In order to do this, we need to identify spans of digits in the input and move each span onto certain subarrays of a vector. Let's assume at this point that the input contains either digits or separators (denoted with _ ).

For instance, the vector with one-digit numbers 1___2_3__4__5_6_ must be transformed into 123456__________ . Then an SSE procedure will convert in parallel all the numbers into the array [1, 2, 3, 4, 5, 6]. Likewise, the vector with two-digit numbers _12__34___56_78_ must be transformed into 12345678________ and then the result will be [12, 34, 56, 78].

Let's consider a more complicated case, when a string has numbers with different count of digits. There are a few ways to convert the input _1_2_34_567_89__ :

If we choose conversion of one-digit numbers, then just the two first spans can be converted — because we need to keep the order of numbers from the input. After normalization the input into 12______________ just two values [1, 2] will be produced. The input's tail _34_567_89__ remain untouched.

just two values [1, 2] will be produced. The input's tail remain untouched. If we choose conversion of two-digit numbers, then the three first spans can be converted. Shorter numbers are completed with zeros, then normalized input is 010234__________ . The result is [1, 2, 34]; this time a bit shorter input's tail _567_89__ remain untouched.

. The result is [1, 2, 34]; this time a bit shorter input's tail remain untouched. Finally, if we choose conversion of four-digit numbers, then the four first spans can be converted. Again, shorter numbers are completed with zeros, and normalized input is 0001000200340567 . The result is [1, 2, 34, 567], but still the chunk's tail, i.e. _89__ , is unprocessed.

We can see that in order to convert given span combination we need to know:

How to shuffle bytes in the input vector? Which SIMD procedure can be used then? How many numbers converted by the SIMD procedure must be stored?

Obtaining this information seems to be quite complicated, especially when we look at the last example. Fortunately, all parameters can be precalculated. A span combination can be saved as a bit-pattern, where ones represent digits. For example, from vector _1_2_34_567_89__ we get the span pattern 0b0101011011101100 = 0x56ec . A span pattern is used to fetch a record from the precalculated array. The record contains following fields:

shuffle_digits — an array of 16 bytes, which is the argument for the instruction _mm_shuffle_epi8 ( pshufb ); the instruction moves bytes at certain positions;

— an array of 16 bytes, which is the argument for the instruction ( ); the instruction moves bytes at certain positions; conversion_routine — an enumeration that selects an SSE conversion procedure; for instance, it tells that shuffled input is an array of two-digit numbers;

— an enumeration that selects an SSE conversion procedure; for instance, it tells that shuffled input is an array of two-digit numbers; element_count — the number of elements from the SSE conversion procedure that must be stored in the output collection.

The solution with a precalculated array is suitable only for SSE, as span patterns have 16 bits. In cases of AVX2 and AVX512, where vectors are wider, such a table would be simply too large, respectively 232 or 264 entries. Additionally, the AVX2 version of pshufb instruction works on lanes, i.e. 128-bit halves of a vector, thus it is impossible to shuffle all inputs.

But AVX2 and AVX512 instructions still might be used in some parts of algorithms, especially in input validation.