This algorithm is suitable for all SIMD instruction sets and also SWAR approach. It uses as a predicate equality of the first and the last characters from the substring.

These two characters are populated in two registers, F and L respectively. Then in each iteration two chunks of strings are loaded. The first chunk (A) is read from offset i (where i is the current offset) and the second chunk (B) is read from offset i + k - 1 , where k is substring's length.

Then we compute a vector expression F == A and B == L . This step yields a byte vector (or a bit mask), where "true" values denote position of potential substring occurrences. Finally, just at these positions an exact comparisons of substrings are performed.

Since the mask is non-zero, it means there are possible substring occurrences. As we see, there is only one non-zero element at index 2, thus only one substring comparison must be performed.

First and last?

Choosing the first and the last character from a substring is not always a wise decision. Consider following scenario: a string contains mostly 'A' characters, and a user wants to find "AjohndoeA" — in such situation the number of char-wise would be large.

In order to prevent such situations an implementation can pick "last" character as the farthest character not equal to the first one. If there is no such character, it means that all characters in substring are the same (for example "AAAAA"). A specialised procedure may be used to handle such patterns.