The previous blog entry touched on a form of compile-time protection that complicated static and dynamic analysis. Namely, the protection replaced control-transfer instructions with privileged instructions, followed by data indicating to which location to transfer control. At run-time, attempting to execute those privileged instructions raises an exception. The exception handler, in turn, catches the exceptions, performs the intended transfers, and resumes execution. The resulting disassembly listing is difficult to read in several capacities.

The previous entry attacked the obfuscation to some extent, but we can do more. IDA processor module extensions are a perfect match for this problem. We can essentially trick IDA into thinking that the obfuscated instructions are their unobfuscated originals, so that the static analyst can read the disassembly listing (and use all of IDA's and Hex-Rays' functionality) as though the code had never been obfuscated. The processor module extension route provides a seamless, slipstream implementation that integrates directly into IDA's analysis facilities, thereby performing deobfuscation as the code is disassembled. Fast, cheap, and good: it turns out that you can have all three.

Control Transfer Obfuscation

The previous blog entry described one privileged instruction employed by the protection, and a technique for mitigating its deleterious effects on analysis. In reality, the protection employs three different privileged instructions, enumerated below. The first two use 16-bits worth of data following the privileged instruction to describe the address to which control should be transferred. The third one uses an immediate constant within the instruction as an index into a function pointer table.

in eax, dx / dw XXYYh => call 0x405000+XXYYh, as discussed previously.

in al, dx / dw XXYYh => call 0x405000+XXYYh, where the code bytes at the destination are encrypted, and must be decrypted prior to execution.

in al, XXh => call dword ptr [4011ACh+XX*4].

This form of protection poses an obvious hindrance to reverse engineering and automated analysis: the control-transfer instructions no longer exist, having been replaced by smaller, privileged instructions. As a result, the static analyst must resolve the original branch destinations manually.

Introduction of Bogus Instructions

The technical specifics of the protection mechanisms provide other irritations for static analysis. Upon decoding an instruction, most disassemblers will employ some logic to determine which address(es) should be decoded next. The typical logic is that, in the case of ...:

A conditional jump or call, both the targeted address and the address following the jump or call should be decoded. An unconditional jump, only the targeted address should be decoded. A return instruction, no further instructions should be decoded. Any other instruction, the address following the present one should be decoded.

Since the privileged instructions fall under category #4 above, the disassembler will assume that the address following the instruction contains code and should be decoded. However, with this protection scheme, the address following the privileged instructions may contain data, and hence decoding such data will produce bogus instructions. x86's variable-length instruction encoding scheme magnifies the effects of this problem. When the data decodes to an instruction that is more than two bytes in length, the disassembler will miss valid instructions that begin after the data.

Altogether, the result is a messy disassembly listing that does not reflect which instructions actually execute at run-time. The following figure illustrates the problems discussed above. On the left, we see the obfuscated disassembly listing, with the obfuscated control transfers and bogus instructions indicated. On the right, I have manually cleaned up the listing, indicating the proper control transfers. Each of the types of privileged instructions already described are represented within the figure. Additionally, we see the occluding effects of the bogus instructions: at address 0040824A on the left, there is a three-byte instruction, causing the disassembler to miss the valid instruction at address 0040824C shown on the right, and producing the bogus instructions at addresses 0040824D and 0040824E on the left.