Each 32-bit input word contains four 6-bit fields a , b , c and d ; the expected output from this step:

AVX512VL

AVX512VL defines the instruction vpmultishiftqb , that may replace all variable shift instructions from the previous point. Please note that the layout of 32-bit lanes require the same modification as described in the previous point.

The instruction builds a vector of bytes from octets located at any position in a quadword. Following psudocode shows the algorithm:

for i in 0 .. 7 loop qword := input . qword [ i ] ; for j in 0 .. 7 loop index := indices . byte [ i * 8 + j ] ; output . byte [ i * 8 + j ] = rotate_right ( qword , index ) and 0 xff ; end loop end loop

Although vpmultishiftqb produces a vector of bytes and the encoding needs just 6 lower bits, no masking is needed. The instruction vpermb (described above) does masking internally.

Below is a code snippet the shows the proper parameters for vpmultishiftqb .