バイト単位でコピーするアホなコードの方が、勝手にベクトル化される分、gcc 内蔵のヤツより最大３倍高速なんだってwww

memcpy() compiled with vectorizing compilers





All current compilers for linux should support SSE2 auto-vectorization with

#include <string.h> void *(memcpy)(void *restrict b, const void *restrict a, size_t n){ char *s1 = b; const char *s2 = a; for(; 0<n; --n)*s1++ = *s2++; return b; }

(中略)





x86-64 gcc memcpy()





(中略)





Linking in a user-compiled memcpy(), using the source code presented above, nearly always improves performance. In the cases where the glibc fails to find needed wide moves, performance increases by a factor of 3.