ZeusVM analysis

In this article, we will study an old Zeus sample protected by a virtual machine. We will begin with the analysis of the VM structure, and automatize its reverse engineering using Miasm.

The sample is zeus_sample.zip. All zips in this post are protected using the password “infected”.

This analysis is based on Miasm revision f2a9a35. The corresponding Elfesteem revision is 1ee9171.

First Stages The binary starts with a decryption loop. We will skip this part to focus on our goal: the VM. The decryption ends at address 0x403C26. The final code is executed using a call to GDI32/EnumFontsA which takes a function callback, in our case, the second part (zeusvm_stage2.zip). The second stage can be emulated using Miasm (see dynamic shellcode analysis for more information). Miasm only provides a minimalistic windows structures and functions support, so the code will break during module search for rebuilding original import table. In the malware code, instead of stopping at the unicode null byte to stop the module name hash computation, the length is harcoded. So the code will crash here. You can add more padding in the allocation in miasm/os_dep/win_api_x86_32_seh.py, in the create_module_chain function. jitter . vm . add_memory_page ( addr + offset_name , PAGE_READ | PAGE_WRITE , bname + " \x00 " * 3 + " \x00 " * 0x10 , "Module name %r " % bname_str ) Some more work later, we extract the final stage zeusvm_stage3.zip.

VM Analysis The targeted function is located at address 0x414795. The plan is to start with a quick manual analysis to recover the VM structure, and then to write a script to automate the full process. The interesting part is the loop inside the function. It has a specificity: MOV EAX , DWORD PTR [ EBP + 0xFFFFFFB0 ] MOVZX EAX , BYTE PTR [ EAX ] LEA ECX , DWORD PTR [ EBP + 0xFFFFFFB0 ] CALL DWORD PTR [ EAX * 0x4 + 0x427018 ] The CALL target is [EAX*0x4+0x427018]. This means the address 0x427018 is an array of sub-functions addresses. And here is a dump of this area: . data : 00427018 dd offset sub_40EDE0 . data : 0042701 C dd offset sub_40EDFB . data : 00427020 dd offset sub_40EE19 . data : 00427024 dd offset sub_40EE3A . data : 00427028 dd offset sub_40EE64 . data : 0042702 C dd offset sub_40EE92 . data : 00427030 dd offset sub_40EEBE . data : 00427034 dd offset sub_40EEE8 . data : 00427038 dd offset sub_40EF16 ... This is called the “mnemonic handler”. In a classical virtual machine, the code fetches the opcode of the current mnemonic at the VM program counter VM_PC, and jumps to the function which will emulate the mnemonic behavior. This means EAX contains the opcode number. If we track it back to its origin, we land on the MOVZX, which reads the memory from the pointer stored in [EBP+0xFFFFFFB0]. We can guess that it may correspond to VM_PC. Note that after the mnemonic handler call, register AL is tested to continue or stop the virtual machine execution.

First mnemonic Note that ECX points to the address of the VM_PC. Let’s look at the first mnemonic at 0x40EDE0. Here starts the hard part. The first line reads the content of ECX, which may be the address of the VM_PC, and assigns it to EAX. If we skip the two next lines, VM_PC is incremented, and stored back to its location. This may mean the first side effect of this mnemonic is to increment the VM_PC, implying that the mnemonic length is one. Back to the second line. It reads a byte of memory at EAX (VM_PC in our case). This can be interpreted as the first byte of the current mnemonic. This byte is then xored with 0xE9 and kept in DL. The code then checks if the byte at EAX (which is now the original VM_PC plus one) has its most significant bit set. If it’s cleared, the function returns zero, and DL is not used. In the other case, the code reads again the byte at VM_PC+1, xor it using DL, clear the most significant bit and update it in memory. Then, the function returns 1. Let’s summary the mnemonic: increment VM_PC by one

by one mess up the next mnemonic byte at VM_PC+1 under certain conditions Note that if we look back at the parent function, the fact that the mnemonic returns one means the loop will continue its job.

Second mnemonic Here is the second mnemonic. Sounds familiar? The code is nearly the same as the previous one. The only differences are: the VM_PC is incremented by two

is incremented by two the messed up byte is at VM_PC+2 , and the byte from VM_PC+1 instead of VM_PC is xored with 0x78 instead of 0xE9

Third mnemonic Easy one again: VM_PC += 4

same mess on byte at address VM_PC+4 , using different xor values.

Fourth mnemonic Ha! A bit different. There is the byte messing up part which is similar to the first three mnemonics. But there is more: the code reads an integer from ECX+4, and xors the byte at this address with the byte at VM_PC+1, which can be interpreted as the second byte of the current instruction. The trick is we don’t know for the moment what is at ECX+4. As the value at ECX is VM_PC, maybe we can infer the value at ECX+4 is another register of the VM (let’s say R0) The VM_PC is incremented twice, which means the mnemonic length may be 2 (the current opcode, and the value of the xor). This mnemonic can be written as: XOR BYTE PTR [ R0 ], u08 with u08 an immediate from the mnemonic bytes. Another question remains: what is the mess done with the xor at each mnemonic? As we could see, each mnemonic will test the most significant bit of the next byte and xor it with a hardcoded byte. Interpretation: the current mnemonic decrypts the first byte of the next mnemonic if its most significant bit is set. It then clears it, meaning the mnemonic is decrypted. This means the three first mnemonics are in fact NOPS of three different lengths because they do nothing (except the decryption and the VM_PC update). As we saw in the parent function, the opcode is coded on a single byte, but this byte is always decrypted so its most significant bit is zero. This means we may have at most 0x7F mnemonics. If we look back at the mnemonics table, we can see it ends a bit before this limit. Our hypothesis seems legit. In our case we have 69 mnemonics. Well, there are a bit too many mnemonics to analyze all of them manually.

Automating the analysis The goal of this section is to automatically retrieve each mnemonic’s behaviour. To do this, we will symbolically execute them and analyze the resulting side effects. Here is a first draft of the symbolic execution code (other symbolic examples are available in previous post): from argparse import ArgumentParser from pdb import pm from miasm2.analysis.machine import Machine from miasm2.analysis.binary import Container from miasm2.ir.symbexec import symbexec from miasm2.arch.x86.sem import ir_x86_32 from miasm2.arch.x86 import regs from miasm2.core.utils import * from miasm2.expression.expression import * from miasm2.core import asmbloc # Transform native assembly into IR def get_block ( ir_arch , mdis , ad ): lbl = ir_arch . get_label ( ad ) if not lbl in ir_arch . blocs : b = mdis . dis_bloc ( lbl . offset ) ir_arch . add_bloc ( b ) b = ir_arch . get_bloc ( lbl ) if b is None : raise LookupError ( 'No block found at that address: %s ' % lbl ) return b parser = ArgumentParser ( "XXXXX TODO" ) parser . add_argument ( 'filename' , help = "File to disassemble" ) args = parser . parse_args () machine = Machine ( "x86_32" ) cont = Container . from_stream ( open ( args . filename )) bs = cont . bin_stream mdis = machine . dis_engine ( bs , symbol_pool = cont . symbol_pool ) ir_arch = ir_x86_32 ( mdis . symbol_pool ) sb = symbexec ( ir_arch , regs . regs_init ) mnemonic_array_addr = 0x427018 ad = ExprInt32 ( upck32 ( bs . getbytes ( mnemonic_array_addr , 4 ))) print 'Mnemonic addr' , ad while True : print 'Block' , ad get_block ( ir_arch , mdis , ad ) ad = sb . emul_ir_bloc ( ir_arch , ad ) if not isinstance ( ad , ExprInt ): raise ValueError ( "Unknown destination %s " % ad ) The first part of the code initializes the disassembly engine, the target and the symbolic execution engine. regs.regs_init is a dictionnary that specifies the initial symbolic value of all registers. Note that each register is initialized to a distinct symbol: EAX_init for EAX, EBX_init for EBX and so on. Here is the first run: Mnemonic addr 0x40EDE0 Block 0x40EDE0 Traceback (most recent call last): File "emul_instr_01.py", line 46, in <module> raise ValueError("Unknown destination %s" % ad) ValueError: Unknown destination ((@8[(@32[ECX_init]+0x1)]&0x80)?(0x40EDEF,0x40EDF8)) The first basic block is executed but it stalls on the decision for the conditional jump. The destination is: ((@8[(@32[ECX_init]+0x1)]&0x80)?(0x40EDEF,0x40EDF8)) As everything is symbolic, we can’t resolve this test and we don’t known which path to take. But if we look at the condition of the expression, we can deduce it’s the test on the next (de)ciphered mnemonic. @8[(@32[ECX_init]+0x1)] & 0x80 is in fact @8[VM_PC_init+0x1] & 0x80, which is the MSB of the next mnemonic opcode. We will first give to the engine the information that @32[ECX_init] is the VM_PC_init. vm_pc_init = ExprId ( 'VM_PC_init' ) infos = {} infos [ expr_simp ( ExprMem ( regs . ECX_init , 32 ))] = vm_pc_init The use of expr_simp is here to ensure the expression will have a canonical and simplified form. The symbolic engine always works with expressions which have both of those properties, so if you add information to it, you have to use this form. Here is the new output: ValueError: Unknown destination ((@8[(VM_PC_init+0x1)]&0x80)?(0x40EDEF,0x40EDF8)) Our goal here is to add the information that @8[VM_PC] & 0x80 resolves to 0x80 at runtime. The fact is that we don’t known how the VM_PC has been modified during the execution. Here, we will use the engine to resolve the current value of VM_PC and add the information using an expression based on this value. In our case, the information to add is @32[VM_PC] & 0x80 == 0x80. First we will evaluate VM_PC, which gives us VM_PC_init+1. So the final information is @32[VM_PC_init+1] & 0x80 == 0x80. The code will then resolve the next destination, which will decrypt the next opcode. We will be able to retrieve its equation later. if isinstance ( ad , ExprCond ): print "Condition in destination:" , ad vm_pc = sb . eval_expr ( ExprMem ( regs . ECX_init )) print 'Current VM_PC:' , vm_pc ad = ad . replace_expr ({ ExprMem ( vm_pc , 8 ) & ExprInt8 ( 0x80 ): ExprInt8 ( 0x80 )}) ad = expr_simp ( ad ) print 'Final destination:' , ad Here is the result: ... Condition in destination: ((@8[(VM_PC_init+0x1)]&0x80)?(0x40EDEF,0x40EDF8)) Current VM_PC: (VM_PC_init+0x1) Final destination: 0x40EDEF The code is redirected to the correct case! But we are facing another error: Traceback ( most recent call last ): File "emul_instr_02.py" , line 101 , in < module > raise ValueError ( "Unknown destination %s " % ad ) ValueError : Unknown destination @ 32 [ ESP_init ] The code has resolved the test and has executed the next block. As the function terminates on a RET, the destination is resolved as @32[ESP_init] and it fails. We forgot that this code is called through a CALL. We could stop by detecting a jump to the expression @32[ESP_init], which means we are getting back to the caller. For the exercise, we will add the information that a special symbol ExprId('RET_ADDR') has been pushed to the stack before the execution of the function. If PC is evaluated to this symbol (for example after the RET of the function), we stop the execution. ret_addr = ExprId ( 'RET_ADDR' ) # Push return addr infos [ expr_simp ( ExprMem ( regs . ESP_init - ExprInt32 ( 4 )))] = ret_addr infos [ regs . ESP ] = expr_simp ( regs . ESP_init - ExprInt32 ( 4 )) So after each block execution: if ad is ret_addr : print "Ret addr reached" break Which is validated on our current mnemonic: ... Ret addr reached We completely achieved the symbolic execution of the first mnemonic! Let’s display the final state of the symbolic engine: >>> sb . dump_id () IRDst RET_ADDR zf (((@8[VM_PC_init]^@8[(VM_PC_init+0x1)]^0xE9)&0x7F)?(0x0,0x1)) nf 0x0 pf parity(((@8[VM_PC_init]^@8[(VM_PC_init+0x1)]^0xE9)&0x7F)) of 0x0 cf 0x0 af (VM_PC_init^(VM_PC_init+0x1)^0x1)[4:5] EIP RET_ADDR EAX {0x1,0,8, (VM_PC_init+0x1)[8:32],8,32} ECX {((@8[VM_PC_init]^@8[(VM_PC_init+0x1)]^0xE9)&0x7F),0,8, ECX_init[8:32],8,32} EDX {(@8[VM_PC_init]^0xE9),0,8, EDX_init[8:32],8,32} >>> sb . dump_mem () @8[(VM_PC_init+0x1)] ((@8[VM_PC_init]^@8[(VM_PC_init+0x1)]^0xE9)&0x7F) @32[ECX_init] (VM_PC_init+0x1) dump_id and dump_mem only display variables whose values are different from their initial state (given at the symbolic engine creation). Note that, in the last line, the information we have added to simplify the output is present as well. We will add our own display function to have a good and predictable output. def dump_state ( sb ): print '-' * 20 , "State" , '-' * 20 out = {} for expr , value in sorted ( sb . symbols . items ()): if ( expr , value ) in initial_symbols : continue if ( expr , value ) in infos : continue if expr in [ regs . zf , regs . cf , regs . nf , regs . of , regs . pf , regs . af , ir_arch . IRDst , regs . EIP ]: continue expr = expr_simp ( expr . replace_expr ( infos )) value = expr_simp ( value . replace_expr ( infos )) if expr == value : continue out [ expr ] = value out = sorted ( out . iteritems ()) x86_regs = [] mem = [] other = [] for expr , value in out : if expr in regs . all_regs_ids : x86_regs . append (( expr , value )) elif isinstance ( expr , ExprMem ): mem . append (( expr , value )) else : other . append (( expr , value )) print "Regs:" for item in other : print " \t %s = %s " % item print "Mem:" for item in mem : print " \t %s = %s " % item print "x86:" for item in x86_regs : print " \t %s = %s " % item print Which gives the output: -------------------- State -------------------- Regs : VM_PC_init = ( VM_PC_init + 0x1 ) Mem : @ 8 [( VM_PC_init + 0x1 )] = (( @ 8 [ VM_PC_init ] ^@ 8 [( VM_PC_init + 0x1 )] ^ 0xE9 ) & 0x7F ) x86 : EAX = { 0x1 , 0 , 8 , ( VM_PC_init + 0x1 )[ 8 : 32 ], 8 , 32 } ECX = {(( @ 8 [ VM_PC_init ] ^@ 8 [( VM_PC_init + 0x1 )] ^ 0xE9 ) & 0x7F ), 0 , 8 , ECX_init [ 8 : 32 ], 8 , 32 } EDX = {( @ 8 [ VM_PC_init ] ^ 0xE9 ), 0 , 8 , EDX_init [ 8 : 32 ], 8 , 32 } The result matches our previous findings! Here is the result for the fourth mnemonic: -------------------- State -------------------- Regs : VM_PC_init = ( VM_PC_init + 0x3 ) Mem : @ 32 [( ESP_init + 0xFFFFFFF8 )] = ESI_init @ 8 [( VM_PC_init + 0x3 )] = (( @ 8 [( VM_PC_init + 0x1 )] ^@ 8 [( VM_PC_init + 0x3 )] ^ 0x47 ) & 0x7F ) @ 32 [( ECX_init + 0x4 )] = ( @ 32 [( ECX_init + 0x4 )] + 0x2 ) @ 16 [ @ 32 [( ECX_init + 0x4 )]] = ( @ 16 [ @ 32 [( ECX_init + 0x4 )]] ^@ 16 [( VM_PC_init + 0x1 )]) x86 : EAX = { 0x1 , 0 , 8 , @ 16 [( VM_PC_init + 0x1 )][ 8 : 16 ], 8 , 16 , ( VM_PC_init + 0x1 )[ 16 : 32 ], 16 , 32 } ECX = ( VM_PC_init + 0x3 ) EDX = {( @ 8 [( VM_PC_init + 0x1 )] ^ 0x47 ), 0 , 8 , EDX_init [ 8 : 32 ], 8 , 32 } Notice the @32[(ESP_init+0xFFFFFFF8)] = ESI_init. This variable corresponds in the code to: PUSH ESI ... POP ESI We can notice as well that ESP is not displayed, so its value at the end of the emulation is its initial state (ESP_init). Here, we can make the assumption that the code doesn’t use variables above the stack, so at each step of the symbolic emulation, we can remove the memory variables based on ESP_init which are above the current ESP. For example, imagine we have a PUSH EAX which, at runtime, gives @32[ESP_init-0x10] = EAX_init. Later, we pop some values and the runtime value of ESP is ESP_init-0x4. Based on our hypothesis, the previous value is above the stack, so we can remove it. If you wonder “how can we compare two symbolic values?”, here is the answer: subtraction. Here it is: ( ESP_init - 0x10 ) - ( ESP_init - 0x4 ) = - 0xC -0xC is a concrete negative integer, so we can say that (ESP_init-0x4) is bigger than (ESP_init-0x10). To delete all elements above the stack, just call sb.del_mem_above_stack(ir_arch.sp) at the end of each symbolic execution. Here is the cleaned up result: -------------------- State -------------------- Regs : VM_PC_init = ( VM_PC_init + 0x3 ) Mem : @ 8 [( VM_PC_init + 0x3 )] = (( @ 8 [( VM_PC_init + 0x1 )] ^@ 8 [( VM_PC_init + 0x3 )] ^ 0x47 ) & 0x7F ) @ 32 [( ECX_init + 0x4 )] = ( @ 32 [( ECX_init + 0x4 )] + 0x2 ) @ 16 [ @ 32 [( ECX_init + 0x4 )]] = ( @ 16 [ @ 32 [( ECX_init + 0x4 )]] ^@ 16 [( VM_PC_init + 0x1 )]) x86 : EAX = { 0x1 , 0 , 8 , @ 16 [( VM_PC_init + 0x1 )][ 8 : 16 ], 8 , 16 , ( VM_PC_init + 0x1 )[ 16 : 32 ], 16 , 32 } ECX = ( VM_PC_init + 0x3 ) EDX = {( @ 8 [( VM_PC_init + 0x1 )] ^ 0x47 ), 0 , 8 , EDX_init [ 8 : 32 ], 8 , 32 } The next step is to add the virtual machine registers information. As far as we understand, they are in memory, next to the VM_PC. Let’s add the information (let’s say we have at least 5 registers) Here is the result: -------------------- State -------------------- Regs : VM_PC_init = ( VM_PC_init + 0x3 ) REG0 = ( REG0 + 0x2 ) Mem : @ 8 [( VM_PC_init + 0x3 )] = (( @ 8 [( VM_PC_init + 0x1 )] ^@ 8 [( VM_PC_init + 0x3 )] ^ 0x47 ) & 0x7F ) @ 16 [ REG0 ] = ( @ 16 [ REG0 ] ^@ 16 [( VM_PC_init + 0x1 )]) x86 : EAX = { 0x1 , 0 , 8 , @ 16 [( VM_PC_init + 0x1 )][ 8 : 16 ], 8 , 16 , ( VM_PC_init + 0x1 )[ 16 : 32 ], 16 , 32 } ECX = ( VM_PC_init + 0x3 ) EDX = {( @ 8 [( VM_PC_init + 0x1 )] ^ 0x47 ), 0 , 8 , EDX_init [ 8 : 32 ], 8 , 32 } It seems that this is an equivalent of XOR word ptr [REG0], imm16 in x86. The mnemonic is three bytes long. The key to decrypt the next one is 0x47. Note the REG0 = (REG0+0x2): this can be interpreted as a post increment of the memory base register (here, REG0). So the instruction can be better represented as: XOR word ptr [REG0]++, imm16 In most classic architectures (like ARM for example), the post increment is the size of the memory access. Let’s run the fifth mnemonic (without displaying x86 registers): -------------------- State -------------------- Regs : VM_PC_init = ( VM_PC_init + 0x5 ) REG0 = ( REG0 + 0x4 ) Mem : @ 8 [( VM_PC_init + 0x5 )] = (( @ 8 [( VM_PC_init + 0x1 )] ^@ 8 [( VM_PC_init + 0x5 )] ^ 0xC0 ) & 0x7F ) @ 32 [ REG0 ] = ( @ 32 [ REG0 ] ^@ 32 [( VM_PC_init + 0x1 )]) In this case: XOR dword ptr [REG0]++, imm32, five bytes long, with key 0xC0. Here is another one (we won’t display the next mnemonic deciphering for the sake of clarity): REG0 = ( REG0 + 0x1 ) @ 8 [ REG0 ] = ( @ 8 [ REG0 ] +@ 8 [( VM_PC_init + 0x1 )]) We have ADD byte ptr [REG0]++, imm8 Note that we can add the mnemonics immediates to clarify a bit more the output: # imm expr_imm8 = expr_simp ( ExprMem ( vm_pc_init + ExprInt32 ( 0x1 ), 8 )) infos [ expr_imm8 ] = ExprId ( "imm8" , 8 ) expr_imm16 = expr_simp ( ExprMem ( vm_pc_init + ExprInt32 ( 0x1 ), 16 )) infos [ expr_imm16 ] = ExprId ( "imm16" , 16 ) expr_imm32 = expr_simp ( ExprMem ( vm_pc_init + ExprInt32 ( 0x1 ), 32 )) infos [ expr_imm32 ] = ExprId ( "imm32" , 32 ) New output: REG0 = ( REG0 + 0x1 ) @ 8 [ REG0 ] = ( imm8 +@ 8 [ REG0 ])

Results REG0 = ( REG0 + 0x2 ) @ 16 [ REG0 ] = ( imm16 +@ 16 [ REG0 ]) Mnemonic: ADD word ptr [REG0]++, imm16 REG0 = ( REG0 + 0x4 ) @ 32 [ REG0 ] = ( imm32 +@ 32 [ REG0 ]) Mnemonic: ADD dword ptr [REG0]++, imm32 REG0 = ( REG0 + 0x1 ) @ 8 [ REG0 ] = ( @ 8 [ REG0 ] + ( - imm8 )) Mnemonic: SUB byte ptr [REG0]++, imm8 REG0 = ( REG0 + 0x2 ) @ 16 [ REG0 ] = ( @ 16 [ REG0 ] + ( - imm16 )) Mnemonic: SUB word ptr [REG0]++, imm16 REG0 = ( REG0 + 0x4 ) @ 32 [ REG0 ] = ( @ 32 [ REG0 ] + ( - imm32 )) Mnemonic: SUB dword ptr [REG0]++, imm32 The next one fails: Final destination: ((@8[(VM_PC_init+0x1)]&0x7)?(lbl_gen_00000000:None,0x40EFE3)) Traceback (most recent call last): File "emul_instr_02.py", line 142, in <module> raise ValueError("Unknown destination %s" % ad) ValueError: Unknown destination ((@8[(VM_PC_init+0x1)]&0x7)?(lbl_gen_00000000:None,0x40EFE3)) The reason seems quite simple: the symbolic execution engine didn’t resolve a condition. But strangely, the condition doesn’t appear in the execution flow: In fact, if we look at the possible destinations, there is a generated label, which means a native mnemonic has generated a jump when translated to intermediate representation. Here it is: The mnemonic responsible for this behaviour is ROL. On x86, the rotation has a different behaviour depending on whether the counter is null or not. In Miasm IR, it generates an IF/THEN (without ELSE) condition. To convince yourself, you can assemble a ROL EAX, CL (which gives d3c0). You can use the disassembly example of Miasm: python miasm / example / disasm / full . py - m x86_32 file_to_disasm - g The -g option will generate the intermediate representation of the disassembled assembly and output it to the file graph_irflow.dot. Here it is: Note that we could have implemented the ROL EAX, CL without the test, but it has a drawback: in this case, the IR will always say that the register EAX is written by the instruction, even if CL is zero (which is not the case). So we lose some precision in the intermediate representation. But this can be easily modified in the ROL description, in the file miasm2/arch/x86/sem.py. Looking at the current implementation, one may notice that if the mnemonic uses a non zero constant counter (like in ROL EAX, 8), Miasm will not generate the IF/THEN block (the counter can never be zero). In this case, we should deal with both cases in our symbolic execution. To simplify the problem, we will assume that the value of the counter is never zero. This means that in this case @8[(VM_PC_init+0x1)]&0x7 is always different from zero. The mnemonic symbolic execution now terminates, and here is the result: REG0 = ( REG0 + 0x1 ) @ 8 [ REG0 ] = ( @ 8 [ REG0 ] <<< ( imm8 & 0x7 )) Mnemonic: ROL byte ptr [REG0]++, imm8 The same error is raised on the next mnemonic: Final destination: (({(@8[(VM_PC_init+0x1)]&0xF),0,8, 0x0,8,16}&0x1F)?(lbl_gen_00000000:None,0x40F01C)) After injecting the correct information: REG0 = ( REG0 + 0x2 ) @ 16 [ REG0 ] = ( @ 16 [ REG0 ] <<< ({( imm8 & 0xF ), 0 , 8 , 0x0 , 8 , 16 } & 0x1F )) Mnemonic: ROL word ptr [REG0]++, imm8 Same trick with 32 bit extension: REG0 = ( REG0 + 0x4 ) @ 32 [ REG0 ] = ( @ 32 [ REG0 ] <<< ({( imm8 & 0x1F ), 0 , 8 , 0x0 , 8 , 32 } & 0x1F )) Mnemonic: ROL dword ptr [REG0]++, imm8 Then: REG0 = ( REG0 + 0x1 ) @ 8 [ REG0 ] = ( @ 8 [ REG0 ] >>> ( imm8 & 0x7 )) Mnemonic: ROR byte ptr [REG0]++, imm8 REG0 = ( REG0 + 0x2 ) @ 16 [ REG0 ] = ( @ 16 [ REG0 ] >>> ({( imm8 & 0xF ), 0 , 8 , 0x0 , 8 , 16 } & 0x1F )) Mnemonic: ROR word ptr [REG0]++, imm8 REG0 = ( REG0 + 0x4 ) @ 32 [ REG0 ] = ( @ 32 [ REG0 ] >>> ({( imm8 & 0x1F ), 0 , 8 , 0x0 , 8 , 32 } & 0x1F )) Mnemonic: ROR dword ptr [REG0]++, imm8 REG0 = ( REG0 + 0x1 ) @ 8 [ REG0 ] = ( @ 8 [ REG0 ] ^ 0xFF ) Here, as all bits of the low byte of @8[REG0], this can be interpreted as NOT byte ptr [REG0]++ REG0 = ( REG0 + 0x2 ) @ 16 [ REG0 ] = ( @ 16 [ REG0 ] ^ 0xFFFF ) Mnemonic: NOT word ptr [REG0]++ REG0 = ( REG0 + 0x4 ) @ 32 [ REG0 ] = ( @ 32 [ REG0 ] ^ 0xFFFFFFFF ) Mnemonic: NOT dword ptr [REG0]++ The next one is interesting as it executes a loop (which can be symbolically executed in this case). Here is the result: REG0 = ( REG0 + 0x4 ) @ 8 [( REG0 + {( imm8 & 0x3 ), 0 , 8 , 0x0 , 8 , 32 })] = @ 8 [ REG0 ] @ 8 [( REG0 + {(( imm8 >> 0x4 ) & 0x3 ), 0 , 8 , 0x0 , 8 , 32 })] = @ 32 [ REG0 ][ 16 : 24 ] @ 8 [( REG0 + {(( imm8 >> 0x2 ) & 0x3 ), 0 , 8 , 0x0 , 8 , 32 })] = @ 32 [ REG0 ][ 8 : 16 ] @ 8 [( REG0 + {(( imm8 >> 0x6 ) & 0x3 ), 0 , 8 , 0x0 , 8 , 32 })] = @ 32 [ REG0 ][ 24 : 32 ] Ok, this is a nice one: we can rewrite the access @32[REG0][8:16] as @8[REG0+1] (we make the hypothesis the code won’t segfault in this case, and that we are working in little endian). By the way, this simplification can be implemented through the Miasm simplification engine. REG0 = ( REG0 + 0x4 ) @ 8 [( REG0 + {( imm8 & 0x3 ), 0 , 8 , 0x0 , 8 , 32 })] = @ 8 [ REG0 ] @ 8 [( REG0 + {(( imm8 >> 0x2 ) & 0x3 ), 0 , 8 , 0x0 , 8 , 32 })] = @ 8 [ REG0 + 1 ] @ 8 [( REG0 + {(( imm8 >> 0x4 ) & 0x3 ), 0 , 8 , 0x0 , 8 , 32 })] = @ 8 [ REG0 + 2 ] @ 8 [( REG0 + {(( imm8 >> 0x6 ) & 0x3 ), 0 , 8 , 0x0 , 8 , 32 })] = @ 8 [ REG0 + 3 ] This mnemonic can be seen as a twisted version of the x86 SWAP instruction, which has the resulting bytes order fixed using an immediate argument. So this can be written like: PERMUT dword ptr [REG0]++, imm8 The next one (0x409281) is an interesting mnemonic as well: the execution loops 0x100 times, and fails on an unresolved branch: Condition in destination: (((- {@8[(VM_PC_init+0x1)],0,8, 0x0,8,16})+0x1)?(0x4092D9,0x4092D5)) Here it is: This one is a bit too complex for the symbolic execution, so in this case the ultimate solution is to look at it by hand. This mnemonic takes a key, a memory pointer and a length and decrypts it using RC4. Mnemonic: DEC_RC4 ARG0, ARG1, length Next one: REG1 = { imm8 , 0 , 8 , 0x0 , 8 , 32 } Mnemonic: MOVZX REG1, imm8 REG1 = { imm16 , 0 , 16 , 0x0 , 16 , 32 } Mnemonic: MOVZX REG1, imm16 REG1 = imm32 Mnemonic: MOV REG1, imm32 REG0 = (REG0+{imm16,0,16, (imm16[15:16]?(0xFFFF,0x0)),16,32}) Mnemonic: MOVSX REG0, imm16 The next one (0x40F27F) fails:

Fun with symbolic execution We could analyze it by hand again, but for the exercise we will try another method. As you can see the mnemonic is relatively simple and is composed of two IF/THEN blocks. We will use the symbolic execution engine to execute every possible path, and give us all the possible results (in this case, four). This method has already been detailed in the previous article. We will reuse its steps (without the SAT solver part) to obtain the mnemonic final states. Here is the output: Ret addr reached -------------------- State -------------------- Regs : imm8b = (( imm8 ^ imm8b ^ 0xFF ) & 0x7F ) VM_PC_init = ( VM_PC_init + 0x2 ) Mem : x86 : EAX = { 0x1 , 0 , 8 , ( VM_PC_init + 0x2 )[ 8 : 32 ], 8 , 32 } EDX = REG1 ---------------------------------------- Block 0x40F2A9 Ret addr reached -------------------- State -------------------- Regs : VM_PC_init = ( VM_PC_init + 0x2 ) Mem : x86 : EAX = { 0x1 , 0 , 8 , ( VM_PC_init + 0x2 )[ 8 : 32 ], 8 , 32 } EDX = REG1 ---------------------------------------- Block 0x40F2A0 Ret addr reached -------------------- State -------------------- Regs : REG1 = ( REG1 + 0xFFFFFFFF ) imm8b = (( imm8 ^ imm8b ^ 0xFF ) & 0x7F ) VM_PC_init = ( VM_PC_init + ( - { imm8 , 0 , 8 , 0x0 , 8 , 32 }) + 0x2 ) Mem : x86 : EAX = { 0x1 , 0 , 8 , ( VM_PC_init + ( - { imm8 , 0 , 8 , 0x0 , 8 , 32 }) + 0x2 )[ 8 : 32 ], 8 , 32 } EDX = { imm8 , 0 , 8 , 0x0 , 8 , 32 } ---------------------------------------- Block 0x40F2A0 Ret addr reached -------------------- State -------------------- Regs : REG1 = ( REG1 + 0xFFFFFFFF ) VM_PC_init = ( VM_PC_init + ( - { imm8 , 0 , 8 , 0x0 , 8 , 32 }) + 0x2 ) Mem : x86 : EAX = { 0x1 , 0 , 8 , ( VM_PC_init + ( - { imm8 , 0 , 8 , 0x0 , 8 , 32 }) + 0x2 )[ 8 : 32 ], 8 , 32 } EDX = { imm8 , 0 , 8 , 0x0 , 8 , 32 } The first two states only increment VM_PC by two (the difference between them is the next mnemonic which is decrypted in the first case, and not in the second one). The remaining states have the same difference, but the VM_PC is different, as well as REG0: REG1 = ( REG1 + 0xFFFFFFFF ) VM_PC_init = ( VM_PC_init + ( - { imm8 , 0 , 8 , 0x0 , 8 , 32 }) + 0x2 ) Moreover, if we look at the condition forced by the code: Final destination: (REG1?(0x40F2A0,0x40F2A9)) Looks familiar? This can be read like this: decrement REG1 by one; if the original REG1 was: zero: VM_PC += 2

otherwise: substract VM_PC by a signed extended imm8 This is exactly what the x86 mnemonic loop is doing with the ECX register. So: LOOP REG1, label Fun fact: The mnemonic only decrypts the next mnemonic (in both conditions of the loop). So if the loop is taken, we hope the destination mnemonic has already been executed once in order to be decrypted; if not, the code may crash! Now, we have a script which can handle more complex mnemonics. This solution has some limitation though, for example in case of complex loops: it may generate too many states. The next mnemonic is very close to the current one, let’s use the same script: Ret addr reached -------------------- State -------------------- Regs: REG1 = (REG1+0xFFFFFFFF) VM_PC_init = (VM_PC_init+(- {imm16,0,16, 0x0,16,32})+0x3) Mem: x86: EAX = {0x1,0,8, (VM_PC_init+(- {imm16,0,16, 0x0,16,32})+0x3)[8:32],8,32} EDX = {imm16,0,16, 0x0,16,32} ---------------------------------------- Block 0x40F2B7 Condition in destination: (REG1?(0x40F2D1,0x40F2DA)) VM_PC_init Current VM_PC: (VM_PC_init+0x1) Final destination: (REG1?(0x40F2D1,0x40F2DA)) ---------------------------------------- Block 0x40F2DA Ret addr reached -------------------- State -------------------- Regs: VM_PC_init = (VM_PC_init+0x3) Mem: x86: EAX = {0x1,0,8, (VM_PC_init+0x3)[8:32],8,32} EDX = REG1 ---------------------------------------- Block 0x40F2D1 Ret addr reached -------------------- State -------------------- Regs: REG1 = (REG1+0xFFFFFFFF) VM_PC_init = (VM_PC_init+(- {imm16,0,16, 0x0,16,32})+0x3) Mem: @8[(VM_PC_init+0x3)] = ((imm8^@8[(VM_PC_init+0x3)]^0xFB)&0x7F) x86: EAX = {0x1,0,8, (VM_PC_init+(- {imm16,0,16, 0x0,16,32})+0x3)[8:32],8,32} EDX = {imm16,0,16, 0x0,16,32} ---------------------------------------- Block 0x40F2DA Ret addr reached -------------------- State -------------------- Regs: VM_PC_init = (VM_PC_init+0x3) Mem: @8[(VM_PC_init+0x3)] = ((imm8^@8[(VM_PC_init+0x3)]^0xFB)&0x7F) x86: EAX = {0x1,0,8, (VM_PC_init+0x3)[8:32],8,32} EDX = REG1 The mnemonic is exatly the same except the fact that it’s three bytes long, and the label is computed using signed extended imm16. Mnemonic: LOOP REG1, label The next one: @ 32 [( ECX_init + (({ imm8 , 0 , 8 , 0x0 , 8 , 32 } & 0xF ) * 0x4 ) + 0xC )] = { @ 8 [( VM_PC_init + 0x2 )], 0 , 8 , 0x0 , 8 , 32 } In this case, we have an uncommon form: we write at @32[ECX_init + XXXX * 4 + 0xC], with XXXX between 0x0 and 0xF (as it’s masked using 0xF) . From our previous analysis, we decided that: @32[ECX_init + 0x0] is VM_PC

is @32[ECX_init + 0x4] is REG0

is @32[ECX_init + 0x8] is REG1 So ECX_init + XXXX * 4 + 0xC is just after those registers. Let’s inform the engine that @32[ECX_init + XXXX * 4 + 0xC] is REGX. #(ECX_init+(({imm8,0,8, 0x0,8,32}&0xF)*0x4)+0xC) base_regx = expr_simp ( regs . ECX_init + ( imm8 . zeroExtend ( 32 ) & ExprInt32 ( 0xF )) * ExprInt32 ( 4 ) + ExprInt32 ( 0xC )) infos [ expr_simp ( ExprMem ( base_regx , 32 ))] = ExprId ( "REGX" , 32 ) Here is the result: REGX = { @ 8 [( VM_PC_init + 0x2 )], 0 , 8 , 0x0 , 8 , 32 } Note that in this case, the first immediate is used to inform the register index, so the immediate argument is @8[(VM_PC_init+0x2)] instead of @8[(VM_PC_init+0x1)]. In this case, we will create another alias, called imm8b/imm16b/imm32b. # immb expr_imm8 = expr_simp ( ExprMem ( vm_pc_init + ExprInt32 ( 0x2 ), 8 )) infos [ expr_imm8 ] = ExprId ( "imm8b" , 8 ) expr_imm16 = expr_simp ( ExprMem ( vm_pc_init + ExprInt32 ( 0x2 ), 16 )) infos [ expr_imm16 ] = ExprId ( "imm16b" , 16 ) expr_imm32 = expr_simp ( ExprMem ( vm_pc_init + ExprInt32 ( 0x2 ), 32 )) infos [ expr_imm32 ] = ExprId ( "imm32b" , 32 ) Result: REGX = { imm8b , 0 , 8 , 0x0 , 8 , 32 } Mnemonic: MOVZX REGX, imm8 Next one: REGX = { imm16b , 0 , 16 , 0x0 , 16 , 32 } Mnemonic: MOVZX REGX, imm16 REGX = @ 32 [( VM_PC_init + 0x2 )] Mnemonic: MOV REGX, imm32 REGX = { @ 8 [( ECX_init + ({ imm8 [ 4 : 8 ], 0 , 4 , 0x0 , 4 , 32 } * 0x4 ) + 0xC )], 0 , 8 , 0x0 , 8 , 32 } Ok, this time this is exactly the same registers accesses, but instead of using the low bits of the instruction immediate, it uses the high part as read register index. Let’s add the information: base_regx = expr_simp ( regs . ECX_init + ( imm8 [ 4 : 8 ] . zeroExtend ( 32 )) * ExprInt32 ( 4 ) + ExprInt32 ( 0xC )) infos [ expr_simp ( ExprMem ( base_regx , 32 ))] = ExprId ( "REGY" , 32 ) Result: REGX = { REGY [ 0 : 8 ], 0 , 8 , 0x0 , 8 , 32 } So mnemonic: MOVZX REGX, REGY.u08 Next one: REGX = { REGY [ 0 : 16 ], 0 , 16 , 0x0 , 16 , 32 } Mnemonic: MOVZX REGX, REGY.u16 REGX = REGY Mnemonic: MOV REGX, REGY REGX = ( REGX + { REGY [ 0 : 8 ], 0 , 8 , 0x0 , 8 , 32 }) Mnemonic: ADD REGX, REGY.u08 REGX = ( REGX + { REGY [ 0 : 16 ], 0 , 16 , 0x0 , 16 , 32 }) Mnemonic: ADD REGX, REGY.u16 REGX = ( REGX + REGY ) Mnemonic: ADD REGX, REGY REGX = ( REGX + ( - { REGY [ 0 : 8 ], 0 , 8 , 0x0 , 8 , 32 })) Mnemonic: SUB REGX, REGY.u08 REGX = ( REGX + ( - { REGY [ 0 : 16 ], 0 , 16 , 0x0 , 16 , 32 })) Mnemonic: SUB REGX, REGY.u16 REGX = ( REGX + ( - REGY )) Mnemonic: SUB REGX, REGY REGX = ( REGX ^ { REGY [ 0 : 8 ], 0 , 8 , 0x0 , 8 , 32 }) Mnemonic: XOR REGX, REGY.u08 REGX = ( REGX ^ { REGY [ 0 : 16 ], 0 , 16 , 0x0 , 16 , 32 }) Mnemonic: XOR REGX, REGY.u16 REGX = ( REGX ^ REGY ) Mnemonic: XOR REGX, REGY REGX = ( REGX + { imm8b , 0 , 8 , 0x0 , 8 , 32 }) Mnemonic: ADD REGX, imm8 REGX = ( REGX + { imm16b , 0 , 16 , 0x0 , 16 , 32 }) Mnemonic: ADD REGX, imm16 REGX = ( REGX + imm32b ) Mnemonic: ADD REGX, imm32 REGX = ( REGX + ( - { imm8b , 0 , 8 , 0x0 , 8 , 32 })) Mnemonic: SUB REGX, imm08 REGX = ( REGX + ( - { imm16b , 0 , 16 , 0x0 , 16 , 32 })) Mnemonic: SUB REGX, imm16 REGX = ( REGX + ( - imm32b )) Mnemonic: SUB REGX, imm32 REGX = ( REGX ^ { imm8b , 0 , 8 , 0x0 , 8 , 32 }) Mnemonic: XOR REGX, imm08 REGX = ( REGX ^ { imm16b , 0 , 16 , 0x0 , 16 , 32 }) Mnemonic: XOR REGX, imm16 REGX = ( REGX ^ imm32b ) Mnemonic: XOR REGX, imm32 REG0 = ( REG0 + 0x1 ) @ 8 [ REG0 ] = ( @ 8 [ REG0 ] + REGX [ 0 : 8 ]) Mnemonic: ADD byte ptr [REG0]++, REGX.u08 REG0 = ( REG0 + 0x2 ) @ 16 [ REG0 ] = ( @ 16 [ REG0 ] + REGX [ 0 : 16 ]) Mnemonic: ADD word ptr [REG0]++, REGX.u16 REG0 = ( REG0 + 0x4 ) @ 32 [ REG0 ] = ( REGX +@ 32 [ REG0 ]) Mnemonic: ADD dword ptr [REG0]++, REGX REG0 = ( REG0 + 0x1 ) @ 8 [ REG0 ] = ( @ 8 [ REG0 ] + ( - REGX [ 0 : 8 ])) Mnemonic: SUB byte ptr [REG0]++, REGX.u08 REG0 = ( REG0 + 0x2 ) @ 16 [ REG0 ] = ( @ 16 [ REG0 ] + ( - REGX [ 0 : 16 ])) Mnemonic: SUB word ptr [REG0]++, REGX.u16 REG0 = ( REG0 + 0x4 ) @ 32 [ REG0 ] = ( @ 32 [ REG0 ] + ( - REGX )) Mnemonic: SUB dword ptr [REG0]++, REGX REG0 = ( REG0 + 0x1 ) @ 8 [ REG0 ] = ( @ 8 [ REG0 ] ^ REGX [ 0 : 8 ]) Mnemonic: XOR byte ptr [REG0]++, REGX.u08 REG0 = ( REG0 + 0x2 ) @ 16 [ REG0 ] = ( @ 16 [ REG0 ] ^ REGX [ 0 : 16 ]) Mnemonic: XOR word ptr [REG0]++, REGX.u16 REG0 = ( REG0 + 0x4 ) @ 32 [ REG0 ] = ( REGX ^@ 32 [ REG0 ]) Mnemonic: XOR dword ptr [REG0]++, REGX REGX = { @ 8 [ REG0 ], 0 , 8 , 0x0 , 8 , 32 } Mnemonic: MOVZX REGX, byte ptr [REG0] REGX = { @ 16 [ REG0 ], 0 , 16 , 0x0 , 16 , 32 } Mnemonic: MOVZX REGX, word ptr [REG0] REGX = @ 32 [ REG0 ] Mnemonic: MOV REGX, dword ptr [REG0] REG0 = ( REG0 + 0x1 ) @ 8 [ REG0 ] = REGX [ 0 : 8 ] Mnemonic: MOV byte ptr [REG0]++, REGX.u08 REG0 = ( REG0 + 0x2 ) @ 16 [ REG0 ] = REGX [ 0 : 16 ] Mnemonic: MOV word ptr [REG0]++, REGX.u16 REG0 = ( REG0 + 0x4 ) @ 32 [ REG0 ] = REGX Mnemonic: MOV dword ptr [REG0]++, REGX And the last one: Block 0x40EE37 -------------------- State -------------------- Regs : Mem : x86 : EAX = { 0x0 , 0 , 8 , EAX_init [ 8 : 32 ], 8 , 32 } Ret addr reached As AL is null, the mnemonic stops the VM!