For an experimental project, I wanted to use an SSE 4.1 instruction (MPSADBW) for some Go assembler. I found out that this instruction wasn’t supported by the Go assembler in the current 1.4.2 version. The proposed solution is to add this as raw opcodes in the assembler.

The instruction (in Go assembler syntax) is:

MPSADBW $0, X0, X1 //; Compare lower part X1[0:12] to X0[0:4], store in X1

Usually you can find documentation on what the opcodes are by simply googling the instruction, or looking in an instruction reference manual. An entry will usually look like this:

It is clear that it must start with 0x66 0x0f 0x3a 0x42. However, the difficult part is that we must also add two registers and an immediate value. I haven’t looked at register encodings for many years, we need some help for that.

So my solution for this is to download an assembler to help – YASM to the rescue.

Once downloaded I created an assembler file containing only this instruction:

BITS 64 MPSADBW xmm1, xmm0, 0

Note that the yasm/nasm syntax is INTEL style: “INSTRUCTION dest, source, imm8“, and not AT&T like the Go assembler where the register order is the other way around. In some cases you might need to add “BITS 64” to indicate that you want 64 bit instructions generated.

This is the only code that needs to go into the source file. Nothing else. If you have saved the file as `test.asm`, you simply run the yasm assembler with no additional flags. On windows that would be:

>yasm-1.3.0-win64 test.asm

This should generate a file called “test” in your current directory. It contains the instruction you need – and nothing more. If you open it and see the hex codes, it corresponds to what you need. Here is how it looks in Sublime Text:

As you can see the register value we needed was 0xc8 to indicate source was xmm0 and destination was xmm1. We can now insert that into our assembler:

// MPSADBW $0, X0, X1 // Compare lower part X1[0:12] to X0[0:4], store in X1 BYTE $0x66; BYTE $0x0f; BYTE $0x3a; BYTE $0x42; BYTE $0xc8; BYTE $0x00

I hope this helps you until we get intrinsics in Go *ahem*.