Assembly_is_Too_High-Level_-_Self_Modifying_Code_with_Basic_Arithmetic

I should say that we are able to do this trick all in assembly, but none of it would make sense without an understanding of machine code.

This post is about simple self-modifying code tricks you can do with addition and subtraction to an instruction to make it another instruction, while also maintaining consistent addressing modes and operands (I.E. adding 8 to the 2nd byte of machine code of 'add bl, 5' would become 'or bl, 5').

This trick is easy for many instructions that share the same first opcode byte. Below is a simple example showing INC and DEC:

The /0 and /1 is referring to bits 3-5 of the byte following FE. So if these bits happen to be 000, then it's INC. If the bits are 001, then it's DEC. There's a more detailed breakdown of how this all works in my previous post on redundancies in the "OP REG, imm" instruction format. Here are the 2 similar instructions in edb:

below is the binary representation of those instructions converted from the hex to binary, with the 'instruction' 3 bits highlighted for clarity:

11111110 11000000

11111110 11001000

Let's take a look at another INC/DEC example:

Even though one instruction is an INC, and the other is a DEC, everything else about the assembly is the same (both to a memory pointer using rcx + the same offset). The only difference is in the 2nd byte of the machine code. You could literally add 8 to that 2nd byte, and turn an INC into a DEC with everything else being the same.

All of these shared-byte instructions are like this, they are +/- away from the next one. So adding 8 to the 2nd byte of a specific type of ADD will render an OR. You add 8 more and get an ADC, again gets SBB, then AND, then SUB, then XOR, and finally CMP. This also means that adding say 32 to the 2nd byte of the SBB would result in a CMP.

Here is a video proof of concept of abusing this idea to xor 0x55 with 0xaa while heavily obfuscating that we are actually xoring:





If you aren't in the mood for video, here is a before and after run of the 4 instructions:

And here is the assembly source:

Appendix:

If you would like to know all of the instructions that you can jump around in with simple maths modifications, here they are:

0x80 - 1-byte operand: ADD, OR, ADC, SBB, AND, SUB, XOR, CMP

0x81 - 4-byte operand: ADD, OR, ADC, SBB, AND, SUB, XOR, CMP

0x82 - 1-byte operand: ADD, OR, ADC, SBB, AND, SUB, XOR, CMP

0x83 - 1-byte operand: ADD, OR, ADC, SBB, AND, SUB, XOR, CMP

0xC0 - 1-byte operand: ROL, ROR, RCL, RCR, SHL, SHR, SAL, SAR

0xC1 - 1-byte operand: ROL, ROR, RCL, RCR, SHL, SHR, SAL, SAR

0xD0 - 0-byte operand: ROL, ROR, RCL, RCR, SHL, SHR, SAL, SAR

0xD1 - 0-byte operand: ROL, ROR, RCL, RCR, SHL, SHR, SAL, SAR

0xD2 - 0-byte operand: ROL, ROR, RCL, RCR, SHL, SHR, SAL, SAR

0xD3 - 0-byte operand: ROL, ROR, RCL, RCR, SHL, SHR, SAL, SAR

0xF6 - 0-byte operand: *TEST, *TEST, NOT, NEG, MUL, IMUL, DIV, IDIV

0xF7 - 0-byte operand: **TEST, **TEST, NOT, NEG, MUL, IMUL, DIV, IDIV

0xFE - 0-byte operand: INC, DEC

0xFF - 0-byte operand: INC, DEC

*operand is actually 1-byte

**operand is actually 4-bytes

Adding or subtracting from the 2nd byte of the resulting machine code puts the instructions in each individual list a value of 8 from each other. So if I wanted to convert a CMP to an ADC, that's 5 instructions back * the 8. So subtracting 40 from one of those CMP instructions would effectively make it an ADC with everything else intact. The only exception is that the TEST instructions have different operand sizes then the rest in the list (they are not convertable).

Also, it may go without saying, but I'm going to say it anyway: this is self-modifying code, therefore, wherever this code is running from, it needs to be rwx. Default from assemblers is r-x. Personally, I manually changed the 05 byte of the permissions area of the p_flags item of the ELF header.