KINS malware: the Virtual Machine

Some days ago I read a post about Kins malware (available at http://blog.fox-it.com/2013/07/25/analysis-of-the-kins-malware/). The article is just a sort of sum up of the most interesting thing about this malware. Too bad it doesn’t reveal the details, if you are curious to know something more you have to go deep inside KINS with your own reversing skills.

Right now I’m writing this post about the Virtual Machine used by this malware, but I hope to add some more things about KINS in the near future. The analysis is made over KINS md5 hash filename 7b5ac02e80029ac05f04fa5881a911b2, if you prefer you can use one of the other samples too because there are some minor changes only.

Approaching the Virtual Machine

Every VM has a preamble, in this phase there are some initialization which are often not strictly related with the VM architecture. In this case we have a part which is strictly related with the VM and a part which is not.

407EF7 push 1000h // Number of bytes to move

407EFC push offset VMBytes // VM sequence of bytes

407F01 call MoveBufferToDinamicAllocatedMemory

407F06 mov [ebp+lpMem], eax // Save the pointer to the 1° byte of the VM

407F09 test eax, eax // Check to see if everything is ok

407F0B jz short VMInit_FAILS

This is the VM initialization part, it simply copies the VM bytes into a new buffer created at runtime. This is somehow unusual for a VM definition. Why does it need to move the bytes elsewhere? This time it’s necessary because the opcode of the next VM instruction is changed by the current VM instruction. The original exe was packed and I can’t tell it for sure but perhaps the section has not Writable flag on, oh well it’s just a guess…

407F13 mov edi, ebx

407F15 mov esi, offset pDataBuffer

407F1A mov ecx, 398h // Length of DataBuffer

407F1F rep movsb // Move DataBuffer

Another memory movement, this is an important buffer because the goal of the VM is to modify this buffer. DataBuffer will be used later by the malware.

407F2F VM_Loop:

407F2F mov eax, [ebp+VMBytes]

407F32 movzx eax, byte ptr [eax] // Get the current opcode of the VM

407F35 lea ecx, [ebp+VMBytes] // Pointer to VM structure

407F38 call VMOpcodeHandlers[eax*4] // Call the current VM instruction

407F3F test al, al // Check return value

407F41 jnz short VM_Loop

Every opcode handler returns a value, 0 or 1. The VM ends when opcode at index 0x44 occours, the definition of this case is pretty simple:

41F9C6 VMEndOpcode proc near

41F9C6 xor al, al // It returns 0...

41F9C8 retn

41F9C8 VMEndOpcode endp

VM structure

Before going through the opcode handlers in detail I think it’s better to tell you something about the VM structure. As I told you before the VM is created for one thing only: modify DataBuffer. Take this in mind!

The VM structure is quite simple, you can locate it at the beginning of each opcode handler looking at the bytes pointed by ECX register (ecx is initialized at 407F35).

The first dword is the EIP of the VM. It starts from the address of the dynamically allocated memory and it’s updated inside every single opcode handler (except the VMEndOpcode of course).

The second dword is dedicated to DataBuffer, more precisely it contains the pointer to that buffer.

The third dword is used to store a counter value used for loops, but you’ll understand better in few minutes what it really means.

After these initial 3 dwords there’s the space for the VM registers. Along the code each register is identified by 4 bits so the VM has been built with 16 dword registers. It’s a simple architecture indeed.

VM instruction set

The VM set of instructions is composed by 69 entries, but not all the defined VM instructions are used along the VM code. Here is the complete list, ‘*’ stands for unused:

431080 * dd offset ADD_Reip_1

431084 * dd offset ADD_Reip_2

431088 * dd offset ADD_Reip_4

43108C * dd offset XOR_pDataBuffer_byte_byteVal

431090 dd offset XOR_pDataBuffer_word_wordVal

431094 dd offset XOR_pDataBuffer_dword_dwordVal

431098 dd offset ADD_pDataBuffer_byte_byteVal

43109C dd offset ADD_pDataBuffer_word_wordVal

4310A0 * dd offset ADD_pDataBuffer_dword_dwordVal

4310A4 dd offset SUB_pDataBuffer_byte_byteVal

4310A8 * dd offset SUB_pDataBuffer_word_wordVal

4310AC * dd offset SUB_pDataBuffer_dword_dwordVal

4310B0 * dd offset ROL_pDataBuffer_byte_byteVal

4310B4 * dd offset ROL_pDataBuffer_word_byteVal

4310B8 dd offset ROL_pDataBuffer_dword_byteVal

4310BC dd offset ROR_pDataBuffer_byte_byteVal

4310C0 * dd offset ROR_pDataBuffer_word_byteVal

4310C4 * dd offset ROR_pDataBuffer_dword_byteVal

4310C8 dd offset NOT_pDataBuffer_byte

4310CC * dd offset NOT_pDataBuffer_word

4310D0 * dd offset NOT_pDataBuffer_dword

4310D4 dd offset Shuffle_pDataBuffer_dword_byteVal

4310D8 dd offset RC4

4310DC * dd offset MOV_Counter_byteVal

4310E0 dd offset MOV_Counter_wordVal

4310E4 * dd offset MOV_Counter_dwordVal

4310E8 dd offset ADD_pDataBuffer_wordVal

4310EC dd offset DEC_Counter_And_Jump_If_Not_Zero

4310F0 * dd offset DEC_Counter_And_Jump_If_Not_Zero_

4310F4 dd offset MOV_Reg_i_byteVal

4310F8 dd offset Mov_Reg_i_wordVal

4310FC dd offset MOV_Reg_i_dwordVal

431100 dd offset MOV_Reg_i_Reg_j_byteVal_extended_to_dword

431104 dd offset MOV_Reg_i_Reg_j_wordVal_extended_to_dword

431108 dd offset MOV_Reg_i_Reg_j

43110C dd offset ADD_Reg_i_Reg_j_byteVal_extended_to_dword

431110 dd offset ADD_Reg_i_Reg_j_wordVal_extended_to_dword

431114 dd offset ADD_Reg_i_Reg_j

431118 dd offset SUB_Reg_i_Reg_j_byteVal

43111C dd offset SUB_Reg_i_Reg_j_wordVal

431120 dd offset SUB_Reg1_Reg2

431124 dd offset XOR_Reg_i_Reg_j_byteVal_extended_to_dword

431128 dd offset XOR_Reg_i_Reg_j_wordVal_extended_to_dword

43112C dd offset XOR_Reg_i_Reg_j

431130 dd offset ADD_Reg_i_byteVal

431134 dd offset Add_Reg_i_wordVal

431138 dd offset Add_Reg_i_dwordVal

43113C dd offset SUB_Reg_i_byteVal_extended_to_dword

431140 dd offset SUB_Reg_i_wordVal_extended_to_dword

431144 dd offset SUB_Reg_i_dwordVal

431148 dd offset XOR_Reg_i_byteVal

43114C dd offset XOR_Reg_i_wordVal

431150 dd offset XOR_Reg_i_dwordVal

431154 dd offset ADD_pDataBuffer_byte_Reg_i_byte

431158 dd offset ADD_pDataBuffer_word_Reg_i_word

43115C dd offset ADD_pDataBuffer_dword_Reg_i_dword

431160 * dd offset SUB_pDataBuffer_byte_Reg_i_byte

431164 dd offset SUB_pDataBuffer_word_Reg_i_word

431168 dd offset SUB_pDataBuffer_dword_Reg_i_dword

43116C dd offset XOR_pDataBuffer_byte_Reg_i_byte

431170 * dd offset XOR_pDataBuffer_word_Reg_i_word

431174 dd offset XOR_pDataBuffer_dword_Reg_i_dword

431178 dd offset MOV_Reg_i_dword_pDataBuffer_byte_extended_to_dword

43117C dd offset MOV_Reg_i_dword_pDataBuffer_word_extended_to_dword

431180 dd offset MOV_Reg_i_dword_pDataBuffer_dword

431184 dd offset MOV_pDataBuffer_byte_Reg_i_byte

431188 dd offset MOV_pDataBuffer_word_Reg_i_word

43118C dd offset MOV_pDataBuffer_dword_Reg_i_dword

431190 dd offset VMEndOpcode

As you can see from the names the VM is not so complex, and everything has been built around the buffer DataBuffer. Basically you can identify some instruction subsets:

1. math_operator DataBuffer, val

2. mov DataBuffer , val

3. math_operator DataBuffer, registry val

4. mov DataBuffer, registry val

5. a set with Shuffle_pDataBuffer_dword_byteVal, RC4, “mov counter val”

The first 4 sets are really easy to understand, while the last one presents some more difficulties.

VM instruction: the common operation

Except few cases, each opcode handler has something in common: the way used to update the opcode of the next VM instruction. This particular new value is obtained in the current instruction applying a simple math operation, i.e.:

420306 mov al, [ecx] // VM.eip of the next VM instruction

420308 xor al, bl // bl is a byte from the current instruction byte sequence

42030A and al, 7Fh

42030C mov [ecx], al // Update the opcode value of the next instruction

It’s a basic operation but if you think a little bit about this operation you’ll understand how hard is to obtain the VM instruction list. The next opcode is always calculated at runtime and it’s not easy to produce a valid static approach.

VM instructions: some explanations

Just to let you understand how to get the meaning of an opcode handler I’m going to show a resume of some samples.

– ADD_Reg_i_Reg_j_wordVal_extended_to_dword

It’s used to add a word value from a register to another one. The register indexes are defined by a byte inside the VM byte sequence.

420039 mov eax, [ecx] // Get VM.eip

42003B mov al, [eax+1] // Get a byte from VM byte sequence, it's use to get the indexes of the two registers

420041 movzx eax, al

420044 mov edx, eax

420046 shr eax, 4 // Reg_j index

420049 movzx eax, word ptr [ecx+eax*4+0Ch] // Get word val inside Reg_j (extended to dword)

42004E and edx, 0Fh // Reg_i index

420051 lea edx, [ecx+edx*4+0Ch] // Reg_i

420055 add [edx], eax // ADD_Reg_i_Reg_j_wordVal_extended_to_dword

420057 add dword ptr [ecx], 2 // Get the address of the next VM opcode: VM.eip = VM.eip + 2

– Shuffle_pDataBuffer_dword_byteVal

It shuffles the four bytes of a specific DataBuffer dword following a scheme defined by a byte value which is taken from the VM byte sequence.

41FD1D mov eax, [ecx] // VM.eip

41FD1F mov al, [eax+1] // Get a byte from VM byte sequence, the shuffle position byte

41FD26 mov [ebp+posByte], al // Save the position byte

41FD29 mov eax, [ecx+4] // pDataBuffer

41FD2C mov eax, [eax] // pDataBuffer(Dword)

41FD38 loc_41FD38:

41FD38 mov al, [ebp+posByte] // Shuffle position byte

41FD3B mov bl, byte ptr [ebp+dwordToShuffle] // Current byte of pDataBuffer

41FD3E shr [ebp+posByte], 2 // Prepare the next position value

41FD42 shr [ebp+dwordToShuffle], 8

41FD46 mov edi, [ecx+4] // pDataBuffer

41FD49 and al, 3 // Position: 0, 1, 2 or 3

41FD4B dec esi // Loop counter (from 4 to 0)

41FD4C movzx eax, al

41FD4F mov [eax+edi], bl // Put the byte in the new position

41FD52 jnz short loc_41FD38 // Jump up and check next byte

– RC4

The weakness of most virtual machines is represented by the definition of every single instruction. Why? Well, because all of them follow almost the same pattern and you can easily identify what’s going on. This case is *against the flow* because the handler contains too many assembly instructions and with this particular handler it’s impossible to understand what’s going on in few seconds. Fortunately there is a big clue along the code:

41A84A xor ecx, ecx

41A85D mov edx, 100h

41A862 loc_41A862:

41A862 mov [esi], cl // esi points to the current byte of a 0x100 long buffer

41A864 inc ecx

41A865 inc esi

41A866 cmp cx, dx

41A869 jb short loc_41A862

The loop is used to fill a 256 byte buffer with values from 0x00 to 0xFF. Can you see the light bulb over your head? This is a common buffer initialization, it can be a great hint or nothing interesting. Look at the next lines of code:

41A86F loc_41A86F:

41A86F movzx ecx, [ebp+var_1]

41A873 mov ebx, [ebp+arg_0]

41A876 mov cl, [ecx+ebx] // current byte from a static buffer @00402B43

41A879 mov dl, [esi] // esi points to the current byte of the 0x00_0xFF buffer

41A87B add cl, dl ; sum of the two bytes

41A87D add [ebp+var_2], cl // sum with a variable

41A880 movzx ecx, [ebp+var_2]

41A884 mov bl, [ecx+eax]

41A887 inc [ebp+var_1]

41A88A mov [esi], bl

41A88C mov [ecx+eax], dl

41A88F movzx ecx, [ebp+var_1]

41A893 cmp cx, [ebp+arg_4]

41A897 jnz short loc_41A89D

It’s like a random shuffle of the 256 bytes of the buffer, but it follows a precise scheme. This is the second clue, and you know, two clues are a proof! This is the initialization of RC4 algo and the buffer @402B43 is the key. Reading the last part of the code you can confirm the use of the symmetric crypto algorithm:

41FE97 push ecx // RC4_Key

41FEA4 call RC4_Init_Ksa

41FEAD push edi // Length of the input buffer

41FEAE push dword ptr [esi+4] // Input buffer to decrypt

41FEB1 call RC4_PRNGA_Decrypt

– “mov counter val”

The VM provides a special register, the counter. It contains counter values and it’s used as a counter for loops. The VM can directly set the initial value of the counter register and it can also decrease it, but nothing else is permitted.

MOV_Counter_wordVal, it’s used to store a specific value inside the counter register:

41FD97 mov eax, [ecx] // VM.eip

41FD99 inc eax

41FD9A mov dl, [eax]

41FD9C mov [ecx], eax

41FD9E push esi

41FD9F movzx esi, word ptr [eax] // Get word value

41FDA2 add eax, 2

41FDA5 mov [ecx+8], esi // Save the value inside couunter register

DEC_Counter_And_Jump_If_Not_Zero decreases the value:

41FE25 loc_41FE25:

41FE25 inc dword ptr [ecx] ; VM.eip++

41FE27 mov edx, [ecx+8] // Counter value

41FE2A mov eax, [ecx]

41FE2C test edx, edx // Is it 0?

41FE2E jz short loc_41FE39

41FE30 dec edx // Counter--

41FE31 mov [ecx+8], edx // Save the new counter value

41FE34 movzx edx, byte ptr [eax] // Get a byte from the VM byte sequene

41FE37 sub eax, edx // Update eip: VM.eip = EM.eip - byteVal

41FE39 loc_41FE39:

41FE39 inc eax // Update eip: VM.eip++

41FE3A mov [ecx], eax // Save the new VM.eip

41FE3C mov al, 1

41FE3E retn

As you can see there’s a check on the counter register value, and then the VM.eip is updated according to that value.

The algorithm produced by the VM

From the analysis of the VM instruction set you can have a general idea of the algorithm implemented inside the VM, everything has been made around DataBuffer. However, if you want to be sure and you don’t want to miss something you can extract the algorithm.

To get the VM algorithm I simply filter a “run trace into” log from Ollydbg (trace starting from 407F2F to 407F43). If you want I can upload the .c code too.

Final words

At the end of the Fox-it’s tutorial there’s a list with some versions of Kins malware, all of them are using the same VM. There are only some minor changes on the method used to obtain the next opcode (xored values are changed…), but seems like nothing has been changed in the structure/instructions of this virtual machine.