Login: Password: Remember Me Register Blogs >> RolfRolles 's Blog

Created: Friday, April 4 2008 15:13.57 CDT Modified: Friday, April 4 2008 15:30.54 CDT Printer Friendly ... Compiler 1, X86 Virtualizer 0 Author: RolfRolles # Views: 23834

There are two types of virtual machine software protections: A) the ones that convert x86 machine code into virtual machine bytecode and execute it at runtime; B) the ones that execute some arbitrary code in a virtual environment. I've discussed the latter several times in the past, and by now there exists a wealth of literature on that variety. But breaking the former kind remains an unsolved problem.



In my article I said "basically, reverse engineering a VM with the common tools is like reverse engineering a scripted installer without a script decompiler: it's repetitious, and the high-level details are obscured by the flood of low-level details". The more I thought about this, the more I realized that the word "basically" is out of place: virtualizing software protections are programming language interpreters, albeit for weird languages.



Consequently, an idea struck me: what we want here is not an interpreter, but a compiler to compile the bytecode back into x86 machine code. I spent a week coding one (~1000 lines) in OCaml to test this theory, and I'm able to report that, indeed, it works. I chose ReWolf's x86 Virtualizer, a simple target that uses some of the same techniques as the heavy hitters in this area. Here is a walkthrough of the analysis and recompilation of a small function with one basic block. The compiler works equally well for arbitrarily-large functions, although that would make this posting unnecessarily long and complicated.



Step -2: Protect something with the virtualizer. In this case I just used ReWolf's sample executable itself.



.text:00401896 call ds:GetTickCount

.text:0040189C push eax

.text:0040189D call _srand

.text:004018A2 pop ecx

.text:004018A3 push 0

.text:004018A5 push offset DialogFunc

.text:004018AA push 0

.text:004018AC push 65h

.text:004018AE push [esp+10h+hInstance]

.text:004018B2 call ds:DialogBoxParamA

.text:004018B8 xor eax, eax

.text:004018BA retn 10h



Step -1: Analyze the virtual machine. Although this was not strictly necessary in this case because ReWolf provided source code, I decided to ignore it and reverse the VM manually, since you don't always have such niceties.



Step 0: Break the polymorphism in the instruction set. I made use of two remarkably ghetto hacks here, one of which may be considered elegant. To avoid provoking any arms races I'll omit the details.



Step 1: Disassemble the relevant region into VM bytecode. In the process, construct a graph in which each vertex is an instruction, and the edges are the flows between them.



.VM:004131D0 db 0C2h, 0C9h, 0C0h, 0BDh, 14h, 0DFh, 63h, 9Ah, 86h, 5Eh, 50h, 30h, 0Bh

.VM:004131D0 db 0Ah, 0C0h, 0C7h, 0CEh, 5Eh, 44h, 0E1h, 0E0h, 0C7h, 0FCh, 0FDh, 12h

.VM:004131D0 db 10h, 50h, 0D8h, 0D2h, 0DBh, 0A6h, 3Dh, 34h, 0C9h, 12h, 0DEh, 0E5h, 4Bh

.VM:004131D0 db 2Ch, 2Eh, 6Eh, 23h, 21h, 27h, 0E2h, 0E5h, 0ECh, 99h, 14h, 13h, 0C2h

.VM:004131D0 db 0E5h, 0F9h, 0FDh, 0F4h, 38h, 14h, 0F7h, 0F0h, 0F9h, 0ABh, 79h, 6, 0D7h

.VM:004131D0 db 0F0h, 8Bh, 88h, 81h, 41h, 87h, 8Ch, 85h, 0F8h, 51h, 9Ah, 26h, 0DFh

.VM:004131D0 db 0CFh, 1Eh, 15h, 75h, 76h, 74h, 6Bh, 98h, 9Dh, 94h, 6Eh, 0Ch, 6Bh, 90h

.VM:004131D0 db 93h, 9Ah, 0Fh



becomes



vertexlist =

[{label = 84; instruction = VMExit 16l};

{label = 81; instruction = LiteralInstruction [|51; 192|]};

{label = 69; instruction = ImagebaseFixupInstruction ([|255; 21; 72; 161; 0; 0|], 2l)};

{label = 65; instruction = PushDereferencedTemp};

{label = 57; instruction = AddImmediateToTemp 20l};

{label = 52; instruction = AddRegisterToTemp Esp};

{label = 44; instruction = SetTemp 0l};

{label = 41; instruction = LiteralInstruction [|106; 101|]};

{label = 38; instruction = LiteralInstruction [|106; 0|]};

{label = 27; instruction = ImagebaseFixupInstruction ([|104; 240; 22; 0; 0|], 1l)};

{label = 24; instruction = LiteralInstruction [|106; 0|]};

{label = 22; instruction = LiteralInstruction [|89|]};

{label = 14; instruction = X86Call 6471l};

{label = 12; instruction = LiteralInstruction [|80|]};

{label = 0; instruction = ImagebaseFixupInstruction ([|255; 21; 40; 160; 0; 0|], 2l)}];

edgelist =

[({contents = {label = 0}}, {contents = {label = 12}});

({contents = {label = 12}}, {contents = {label = 14}});

({contents = {label = 14}}, {contents = {label = 22}});

({contents = {label = 22}}, {contents = {label = 24}});

(* Lots and lots of edges removed *)]



Step 2: Form basic blocks within the instruction-level CFG. The previous output becomes:



vertexlist =

[{label = 0;

instruction =

[|ImagebaseFixupInstruction ([|255; 21; 40; 160; 0; 0|], 2l);

LiteralInstruction [|80|];

X86Call 6471l;

LiteralInstruction [|89|];

LiteralInstruction [|106; 0|];

ImagebaseFixupInstruction ([|104; 240; 22; 0; 0|], 1l);

LiteralInstruction [|106; 0|];

LiteralInstruction [|106; 101|];

SetTemp 0l;

AddRegisterToTemp Esp;

AddImmediateToTemp 20l;

PushDereferencedTemp;

ImagebaseFixupInstruction ([|255; 21; 72; 161; 0; 0|], 2l);

LiteralInstruction [|51; 192|]; VMExit 16l|]}];



Step 3: Optimize the code within the basic block. The goal is to convert sequences of VM instructions into a new language more conducive to being compiled back into X86. The optimizer is the most powerful component of my compiler: it can remove obfuscation automatically simply as a side-effect of being an optimizer (not that ReWolf's has any, but others do), and employs no pattern matching.



vertexlist =

[{label = 0;

instruction =

[|ImagebaseFixupInstruction ([|255; 21; 40; 160; 0; 0|], 2l);

LiteralInstruction [|80|];

X86Call 6471l;

LiteralInstruction [|89|];

LiteralInstruction [|106; 0|];

ImagebaseFixupInstruction ([|104; 240; 22; 0; 0|], 1l);

LiteralInstruction [|106; 0|];

LiteralInstruction [|106; 101|];

SyntheticInstruction (Push, Plus (Constant 20l, Register Esp));

ImagebaseFixupInstruction ([|255; 21; 72; 161; 0; 0|], 2l);

LiteralInstruction [|51; 192|];

VMExit 16l|]}];



Step 4: Recompile all virtual instructions into x86 machine language.



vertexlist =

[{label = 0;

instruction =

[|ImagebaseFixupInstruction ([|255; 21; 40; 160; 0; 0|], 2l);

LiteralInstruction [|80|];

RelativeFixupInstruction ([|232; 0; 0; 0; 0|], 6471l, 1l);

LiteralInstruction [|89|];

LiteralInstruction [|106; 0|];

ImagebaseFixupInstruction ([|104; 240; 22; 0; 0|], 1l);

LiteralInstruction [|106; 0|];

LiteralInstruction [|106; 101|];

LiteralInstruction [|255; 116; 36; 20|];

ImagebaseFixupInstruction ([|255; 21; 72; 161; 0; 0|], 2l);

LiteralInstruction [|51; 192|];

LiteralInstruction [|194; 16; 0|]|]}];



Step 5: Stuff the original bytes back into the binary and perform fixups specified. If you can convert between hex and decimal in your head, you'll notice that the bytes above correspond to those below, modulo fixups. For multi-basic-block functions, this is harder, as you have to sequence the blocks and decide between short and long jumps.



.VM:004131D0 FF 15 28 A0 40 00 call ds:GetTickCount

.VM:004131D6 50 push eax

.VM:004131D7 E8 6B E7 FE FF call loc_401947

.VM:004131DC 59 pop ecx

.VM:004131DD 6A 00 push 0

.VM:004131DF 68 F0 16 40 00 push offset loc_4016F0

.VM:004131E4 6A 00 push 0

.VM:004131E6 6A 65 push 65h

.VM:004131E8 FF 74 24 14 push dword ptr [esp+14h]

.VM:004131EC FF 15 48 A1 40 00 call ds:DialogBoxParamA

.VM:004131F2 33 C0 xor eax, eax

.VM:004131F4 C2 10 00 retn 10h



Step 6: Celebrate. ReWolf's X86 Virtualizer was simple, and surely breaking the harder ones is, well, harder, but I believe that the general principles espoused here should be applicable to the others.



I'll have more to say about this in the future, including source code.



Blog Comments DeepBlueSea Posted: Saturday, April 5 2008 06:11.23 CDT Impressive proof of concept there. This could be an important step towards breaking todays virtualizing madness.

sovietskicpu Posted: Saturday, April 5 2008 08:59.46 CDT Impressive work rolf, really impressive...



( Stoned hehe.... )

Soul12 Posted: Saturday, April 5 2008 10:20.33 CDT props from your nr1 fanboy

ero Posted: Saturday, April 5 2008 12:05.08 CDT Reminds me a bit of Virus Bulletin 2003's paper by Frédéric Perriot, "Defeating Polymorphism Through Code Optimization" on using optimization techniques to try to get a canonical representation of some obfuscared code. He did not apply it to VMs tho.



Really good work Rolf, btw, how did you like OCaml? I've only played briefly with it.

ReWolf Posted: Saturday, April 5 2008 16:47.27 CDT my poor virtualizer ;d, anyway nice work :-) (as always)

RolfRolles Posted: Saturday, April 5 2008 18:38.22 CDT Thanks everybody! It was a fun project, nicely timed too (I needed to code something non-trivial in OCaml).



Ero: I'd never seen this paper before, but it makes sense. Certain obfuscations are "de-optimizations", e.g. worthless instructions inserted into the stream = definitions with no uses, can be eliminated by dead code elimination; e.g. using several instructions to produce a constant value = constant unfolding, can be eliminated by constant folding/propagation, etc. Of course this doesn't apply to all forms of obfuscation.



Regarding OCaml, the Windows OCaml interpreter needs work. For one thing, it keeps leaving these zombie processes around; I had to kill literally two dozen associated processes earlier today. For another, if you paste in, say, 25kb of code at one time, the interpreter will freeze. So I have to copy and paste my VM compiler into the interpreter bit by bit which is annoying. I suppose I should start working with the actual compiler, not the interpreter.



Regarding the language itself, I'm still too green (and not enough of a language maven) to say anything really substantial about it, so I'll let people smarter than me do the talking; see here, and here's a quote from Benjamin C. Pierce from "Types and Programming Languages" on which languages are good choices for static analysis:



"The most important requirements are automatic storage management (garbage collection) and easy facilities for defining recursive functions by pattern matching over structured data types. Functional languages ... with pattern-matching ... are fine choices. Languages with garbage collection but without pattern matching, such as Java ... are somewhat heavy for the sorts of programming we'll be doing. Languages with neither, such as C, are even less suitable."

Raindog Posted: Sunday, April 6 2008 17:17.25 CDT Wow man, you never cease to amaze.

RolfRolles Posted: Sunday, July 20 2008 23:40.59 CDT By popular demand, the sparsely-commented, research-quality code. I apologize for its excessive density. Expect some details about applying the same concept towards breaking VMProtect.

Soul12 Posted: Thursday, July 24 2008 03:24.35 CDT the biggist problem i see ..after re-reading a bit :) is being to setup rules for the VM-script .. i.e say the Opcode size is hardcoded into the Block which executes it .. i.e like in A certain protection .. where you dont know the size of the opcode before at the end of its execution ..so youd still have todo some tracing or byte scanning...atleast to be able to automate the whole thing , or am i missing something rolf :)

zhangfeng79212 Posted: Friday, February 6 2009 04:30.17 CST could you share the source code??I want study from you!thanks very much.

RolfRolles Posted: Friday, February 6 2009 15:59.22 CST I must've accidentally deleted the code at some point, so I've re-uploaded it. See three comments up for the link.

Hendrix Posted: Tuesday, April 21 2009 13:19.41 CDT I like that! Thx for sharing.





