Table of Contents

Introduction

In my previous article, I gave a short introduction into some Anti-Debugging/Debugger Detection techniques that primarily involved the use of Win32 API functions. In this article, I plan to travel a bit deeper into the interesting world of reverse engineering and explore some more intermediate level techniques for annoying reverse engineers. Some comments in my previous article noted that the techniques I presented could, and are most of the time, easily bypassed by intermediate level reversers; one statement I would like to make is that there is an ongoing battle between the coders who develop programs that protect against cracking and reverse engineering and the engineers themselves. Every time the protectors release a new technique, the engineers find a way around that specific method. This is the driving force behind the cracking "scene" and anti-reverse engineering fields. Most of the techniques here can easily be bypassed, and some of the others aren't as easily taken out of the picture; however, all of them can in one way, shape, or form be bypassed. I'm presenting these methods here to share the knowledge, and perhaps inspire others to find ways to apply these methods and utilize them in new and creative ways that challenge contemporary methodology.

Background

Anyone who is interested in the field of reverse engineering needs a strong understanding of Assembly language, so if your ASM is a little rusty or if you're just beginning to learn, here are some sites that can assist:

Inline Functions

I didn't feel this side note required its own section; however, when reading this article or the attached source, one will notice the functions being marked inline. While this can cause bloat inside an executable, it is important in anti-reverse engineering. If there are very detailed function entries and sections, then the job for the reverse engineer just got much easier. Now, he or she knows exactly what is happening when that function is called. When in-lining, this doesn't happen, and the engineer is left guessing as to what is actually happening.

Breakpoints

There are three types of breakpoints available to a reverse engineer: hardware, memory, and INT 3h breakpoints. Breakpoints are essential to a reverse engineer, and without them, live analysis of a module does him or her little good. Breakpoints allow for the stopping of execution of a program at any point where one is placed. By utilizing this, reverse engineers can put breakpoints in areas like Windows APIs, and can very easily find where a badboy message (a messagebox saying you entered a bad serial, for example) is coming from. In fact, this is probably the most utilized technique in cracking, the only competition would be a referenced text string search. This is why breakpoint checks are done over important APIs like MessageBox , VirtualAlloc , CreateDialog , and others that play an important role in the protecting user information process. The first example will cover the most common type of breakpoint which utilizes the INT 3h instruction.

INT 3

INT 3h breakpoints are represented in in the IA-32 instruction set with the opcode CC (0xCC). This is the most common expression of this type of breakpoint; however, it can also be expressed as the byte sequence 0xCD 0x03 which can cause some troubles. Detecting this type of breakpoint is relatively simple, and some source would look like the following sample. However, we should be careful because using this method of scanning can lead to false positives.

bool CheckForCCBreakpoint( void * pMemory, size_t SizeToCheck) { unsigned char *pTmp = ( unsigned char *)pMemory; for ( size_t i = 0 ; i < SizeToCheck; i++) { if (pTmp[i] == 0xCC) return true ; } return false ; }

Here's another obfuscated method for checking for INT 3 breakpoints. It is important to remember that the code shown above would stick out like a sore thumb to even new reversers. By adding another level of indirection, you, the protector, are improving your chances of successfully protecting the application.

bool CheckForCCBreakpointXor55( void * pMemory, size_t SizeToCheck) { unsigned char *pTmp = ( unsigned char *)pMemory; unsigned char tmpchar = 0 ; for ( size_t i = 0 ; i < SizeToCheck; i++) { tmpchar = pTmp[i]; if ( 0x99 == (tmpchar ^ 0x55) ) return true ; } return false ; }

Memory Breakpoints

Memory breakpoints are implemented by a debugger using guard pages, and they act like "a one-shot alarm for memory page access" (Creating Guard Pages). In a nutshell, when a page of memory is marked as PAGE_GUARD and is accessed, a STATUS_GUARD_PAGE_VIOLATION exception is raised, which can then be handled by the current program. At the moment, there's no accurate way to check for memory breakpoints. However, we can use the techniques a debugger uses to implement memory breakpoints to discover if our program is currently running under a debugger. In essence, what occurs is that we allocate a dynamic buffer and write a RET to the buffer. We then mark the page as a guard page and push a potential return address onto the stack. Next, we jump to our page, and if we're under a debugger, specifically OllyDBG, then we will hit the RET instruction and return to the address we pushed onto the stack before we jumped to our page. Otherwise, a STATUS_GUARD_PAGE_VIOLATION exception will occur, and we know we're not being debugged by OllyDBG. Here is an example in source:

bool MemoryBreakpointDebuggerCheck() { unsigned char *pMem = NULL; SYSTEM_INFO sysinfo = { 0 }; DWORD OldProtect = 0 ; void *pAllocation = NULL; GetSystemInfo(&sysinfo); pAllocation = VirtualAlloc(NULL, sysinfo.dwPageSize, MEM_COMMIT | MEM_RESERVE, PAGE_EXECUTE_READWRITE); if (pAllocation == NULL) return false ; pMem = ( unsigned char *)pAllocation; *pMem = 0xc3; if (VirtualProtect(pAllocation, sysinfo.dwPageSize, PAGE_EXECUTE_READWRITE | PAGE_GUARD, &OldProtect) == 0 ) { return false ; } __try { __asm { mov eax , pAllocation push MemBpBeingDebugged jmp eax } } __except (EXCEPTION_EXECUTE_HANDLER) { VirtualFree(pAllocation, NULL, MEM_RELEASE); return false ; } __asm {MemBpBeingDebugged:} VirtualFree(pAllocation, NULL, MEM_RELEASE); return true ; }

Hardware Breakpoints

Hardware breakpoints are a technology implemented by Intel in their processor architecture, and are controlled by the use of special registers known as Dr0-Dr7. Dr0 through Dr3 are 32 bit registers that hold the address of the breakpoint. Dr4 and 5 are reserved by Intel for debugging the other registers, and Dr6 and 7 are used to control the behavior of the breakpoints (Intel1). There is a little bit too much information for me to cover how the Dr6 and Dr7 registers affect breakpoint behavior. However, anyone who is interested should read the Intel® 64 and IA-32 Architectures Software Developer's Manual Volume 3B: System Programming Guide for an in-depth explanation of how the registers work.

Now, in order to detect and/or remove hardware breakpoints, there are two methods we can utilize: the Win32 GetThreadContext and SetThreadContext , or using Structured Exception Handling. In the first example, I'll show how to use the Win32 API functions:

int CheckHardwareBreakpoints() { unsigned int NumBps = 0 ; CONTEXT ctx; ZeroMemory(&ctx, sizeof (CONTEXT)); ctx.ContextFlags = CONTEXT_DEBUG_REGISTERS; HANDLE hThread = GetCurrentThread(); if (GetThreadContext(hThread, &ctx) == 0 ) return - 1 ; if (ctx.Dr0 != 0 ) ++NumBps; if (ctx.Dr1 != 0 ) ++NumBps; if (ctx.Dr2 != 0 ) ++NumBps; if (ctx.Dr3 != 0 ) ++NumBps; return NumBps; }

The SEH method of manipulating the debug registers is much more commonly seen in anti-reverse engineering programs, and is implemented easier in ASM, as shown in the following example:

ClrHwBpHandler proto .safeseh ClrHwBpHandler ClearHardwareBreakpoints proc assume fs:nothing push offset ClrHwBpHandler push fs:[ 0 ] mov dword ptr fs:[ 0 ], esp xor eax , eax div eax pop dword ptr fs:[ 0 ] add esp , 4 ret ClearHardwareBreakpoints endp ClrHwBpHandler proc xor eax , eax mov ecx , [ esp + 0ch] mov dword ptr [ ecx + 04h ], eax mov dword ptr [ ecx + 08h ], eax mov dword ptr [ ecx + 0ch], eax mov dword ptr [ ecx + 10h ], eax mov dword ptr [ ecx + 14h ], eax mov dword ptr [ ecx + 18h ], eax add dword ptr [ ecx + 0b8h], 2 ret ClrHwBpHandler endp

Timing Attacks

The theory behind timing attacks is that executing a section of code, especially a small section, should only take a miniscule amount of time. Therefore, if a timed section of code takes a greater amount of time than a certain set limit, then there is most likely a debugger attached, and someone is stepping through the code. This genre of attacks has many small variations, and the most common example uses the IA-32 RDTSC instruction. Other methods utilize different timing methods such as timeGetTime , GetTickCount , and QueryPerformanceCounter .

RDTSC

RDTSC is an IA-32 instruction that stands for Read Time-Stamp Counter, which is pretty self-explanatory in itself. Processors since the Pentium have had a counter attached to the processor that is incremented every clock cycle, and reset to 0 when the processor is reset. As you can see, this is a very powerful timing technique; however, Intel doesn't serialize the instruction; therefore, it is not guaranteed to be 100% accurate. This is why Microsoft encourages the use of its Win32 timing APIs since they're supposed to be as accurate as Windows can guarantee. The great thing about timing attacks, in general, though is that implementing the technique is rather simple; all a developer needs to do is decide which functions he or she would like to protect using a timing attack, and then he or she can simply surround the blocks of code in a timing block and can compare that to a programmer set limit, and can exit the program if the timed section takes too much time to execute. Here is an example:

#define SERIAL_THRESHOLD 0x10000 // 10,000h ticks DWORD GenerateSerial(TCHAR* pName) { DWORD LocalSerial = 0 ; DWORD RdtscLow = 0 ; __asm { rdtsc mov RdtscLow, eax } size_t strlen = _tcslen(pName); for ( unsigned int i = 0 ; i < strlen; i++) { LocalSerial += (DWORD) pName[i]; LocalSerial ^= 0xDEADBEEF; } __asm { rdtsc sub eax , RdtscLow cmp eax , SERIAL_THRESHOLD jbe NotDebugged push 0 call ExitProcess NotDebugged: } return LocalSerial; }

Win32 Timing Functions

The concepts are exactly the same in this variation except that we have different means of timing our function. In the following example, GetTickCount is used, but as commented, could be replaced with timeGetTime or QueryPerformanceCounter .

#define SERIAL_THRESHOLD 0x10000 // 10,000h ticks GenerateSerialWin32Attack(TCHAR* pName) { DWORD LocalSerial = 0 ; size_t strlen = _tcslen(pName); DWORD Counter = GetTickCount(); for ( unsigned int i = 0 ; i < strlen; i++) { LocalSerial += (DWORD) pName[i]; LocalSerial ^= 0xDEADBEEF; } Counter = GetTickCount() - Counter; if (Counter >= SERIAL_THRESHOLD) ExitProcess( 0 ); return LocalSerial; }

Windows Internals

The following methods of anti-reverse engineering utilize the peculiarities of the Windows Operating System in order to implement some sort of protection, ranging from hiding a thread from a debugger, to revealing the presence of a debugger. Many of the functions used in the following examples are exported from ntdll.dll, and are not guarenteed by Microsoft to behave consistently in different versions of the Operating System. Therefore, some caution should be taken when using these examples in your own programs. That being said, I have yet to see one of these APIs change drastically in behavior, so do not take the previous statement as a commandment to avoid these implementations.

ProcessDebugFlags

The ProcessDebugFlags (0x1f) is an undocumented class that can be passed to the NtQueryProcessInformation function. When NtQueryProcessInformation is called with the ProcessDebugFlags class, the function will return the inverse of EPROCESS->NoDebugInherit , which means that if a debugger is present, then this function will return FALSE if the process is being debugged. Here's the CheckProcessDebugFlags :

inline bool CheckProcessDebugFlags() { typedef NTSTATUS (WINAPI *pNtQueryInformationProcess) (HANDLE ,UINT ,PVOID ,ULONG , PULONG); DWORD NoDebugInherit = 0 ; NTSTATUS Status; pNtQueryInformationProcess NtQIP = (pNtQueryInformationProcess) GetProcAddress( GetModuleHandle( TEXT( " ntdll.dll" ) ), " NtQueryInformationProcess" ); Status = NtQIP(GetCurrentProcess(), 0x1f, &NoDebugInherit, 4 , NULL); if (Status != 0x00000000) return false ; if (NoDebugInherit == FALSE) return true ; else return false ; }

Debug Object Handle

Beginning from Windows XP, when a process is debugged, a debug object would be created for that debugging session. A handle to this object is also created, and can be queried using NtQueryInformationProcess . The presence of this handle shows that the process is being actively debugged, and this information can be quite a pain to hide since it comes from the kernel. Here's the DebugObjectCheck function:

inline bool DebugObjectCheck() { typedef NTSTATUS (WINAPI *pNtQueryInformationProcess) (HANDLE ,UINT ,PVOID ,ULONG , PULONG); HANDLE hDebugObject = NULL; NTSTATUS Status; pNtQueryInformationProcess NtQIP = (pNtQueryInformationProcess) GetProcAddress( GetModuleHandle( TEXT( " ntdll.dll" ) ), " NtQueryInformationProcess" ); Status = NtQIP(GetCurrentProcess(), 0x1e, &hDebugObject, 4 , NULL); if (Status != 0x00000000) return false ; if (hDebugObject) return true ; else return false ; }

Thread Hiding

In Windows 2000, the guys behind Windows introduced a new class to be passed into NtSetInformationThread , and it was named HideThreadFromDebugger . It is the first anti-debugging API implemented by Windows, and is very powerful. The class prevents debuggers from receiving events from any thread that has had NtSetInformationThread with the HideThreadFromDebugger class called on it. These events include breakpoints, and the exiting of the program if it is called on the main thread of an application. Here is the HideThread function:

inline bool HideThread(HANDLE hThread) { typedef NTSTATUS (NTAPI *pNtSetInformationThread) (HANDLE, UINT, PVOID, ULONG); NTSTATUS Status; pNtSetInformationThread NtSIT = (pNtSetInformationThread) GetProcAddress(GetModuleHandle( TEXT( " ntdll.dll" ) ), " NtSetInformationThread" ); if (NtSIT == NULL) return false ; if (hThread == NULL) Status = NtSIT(GetCurrentThread(), 0x11, 0 , 0 ); else Status = NtSIT(hThread, 0x11, 0 , 0 ); if (Status != 0x00000000) return false ; else return true ; }

BlockInput

This is about as simple as it comes. BlockInput does as the names suggests, and blocks mouse and keyboard messages from reaching the desired application; this technique is effective due to the fact that only the thread that called BlockInput can call it to remove the block ("BlockInput Function"). This isn't really an anti-reverse engineering technique, but more of a way to mess with someone debugging your application. A simple source code looks like:

BlockInput(TRUE);

OutputDebugString

The OutputDebugString technique works by determining if OutputDebugString causes an error. An error will only occur if there is no active debugger for the process to receive the string; therefore, we can conclude that if there is no error (by calling GetLastError ) after calling OutputDebugString , then there is a debugger present.

inline bool CheckOutputDebugString(LPCTSTR String ) { OutputDebugString( String ); if (GetLastError() == 0 ) return true ; else return false ; }

Process Exploitation

These techniques exploit the Windows process environment and management system in order to implement protection. Some of these techniques, especially self-debugging, are widely used by many packers and protectors.

Open Process

This debugger detection technique exploits process privileges in order to determine if a process is currently being run under a debugger. This technique works because when a process is attached to or run under a debugger, if the process privileges are not correctly reset by the attaching debugger, the process receives the SeDebugPrivilege set which allows the process to open a handle to any process ("How To Use the SeDebugPrivilege to Acquire Any Process Handle"). This includes a vital system process like csrss.exe, which we normally wouldn't have access to. Here is some source code to illustrate the technique:

inline bool CanOpenCsrss() { HANDLE Csrss = 0 ; Csrss = OpenProcess(PROCESS_ALL_ACCESS, FALSE, GetCsrssProcessId()); if (Csrss != NULL) { CloseHandle(Csrss); return true ; } else return false ; }

Parent Process

Normally, Windows users start a process from a window created or provided by the Windows Shell. In this situation, the child process's parent process is Explorer.exe. Therefore, we can retrieve the process ID of Explorer.exe and our parent process and compare them. This is, of course, a somewhat risky process since the parent process of your process isn't guaranteed to be Explorer.exe; nonetheless, it is still an interesting technique, and here is an example:

inline bool IsParentExplorerExe() { DWORD PPID = GetParentProcessId(); if (PPID == GetExplorerPIDbyShellWindow()) return true ; else return false ; }

Self-Debugging is a technique where the main process spawns a child process that debugs the process that created the child process, as shown in the diagram. This technique can be very useful as it can be utilized to implement techniques such as Nanomites and others. This also prevents other debuggers from attaching to the same process; however, this can be bypassed be setting the EPROCESS->DebugPort (the EPROCESS structure is a struct returned by the kernel mode function PsGetProcessId ) field to 0. This allows another debugger to attach to a process that already has a debugger attached to it. Here's some sample code:

void DebugSelf() { HANDLE hProcess = NULL; DEBUG_EVENT de; PROCESS_INFORMATION pi; STARTUPINFO si; ZeroMemory(&pi, sizeof (PROCESS_INFORMATION)); ZeroMemory(&si, sizeof (STARTUPINFO)); ZeroMemory(&de, sizeof (DEBUG_EVENT)); GetStartupInfo(&si); CreateProcess(NULL, GetCommandLine(), NULL, NULL, FALSE, DEBUG_PROCESS, NULL, NULL, &si, &pi); ContinueDebugEvent(pi.dwProcessId, pi.dwThreadId, DBG_CONTINUE); WaitForDebugEvent(&de, INFINITE); }

UnhandledExceptionFilter

The UnhandledExceptionFilter is the long name for an exception handler that is called when there are no other handlers to handle the exception. The following diagram shows how Windows propagates exceptions. When utilizing the UnhandledExceptionFilter technique, one needs to be aware that if a debugger is attached, that process will exit instead of resuming execution, which in the context of anti-reverse engineering is quite fine, in my opinion.

LONG WINAPI UnhandledExcepFilter(PEXCEPTION_POINTERS pExcepPointers) { SetUnhandledExceptionFilter((LPTOP_LEVEL_EXCEPTION_FILTER) pExcepPointers->ContextRecord->Eax); pExcepPointers->ContextRecord->Eip += 2 ; return EXCEPTION_CONTINUE_EXECUTION; } int main() { SetUnhandledExceptionFilter(UnhandledExcepFilter); __asm {xor eax , eax } __asm {div eax } }

NtQueryObject

The NtQueryObject function, when called with the ObjectAllTypesInformation class, will return information about the host system and the current process. There is a wealth of information to be mined from this function, but we're most concerned with the information given about the DebugObject s in the environment. In Windows XP and Vista, a DebugObject entry is maintained in this list of objects, and most importantly, the number of objects of each type of object. The object and its related information can be expressed as a OBJECT_INFORMATION_TYPE struct. However, calling the NtQueryObject function with the ObjectAllTypesInformation class actually returns a buffer that begins with a OBJECT_TYPE_INFORMATION struct. However, there is more than one OBJECT_INFORMATION_TYPE entry, and traversing the buffer containing these entries isn't as straightforward as array indexing. The source shows that the next OBJECT_INFORMATION_TYPE struct lies after the previous' UNICODE_STRING.Buffer entry. These structs are also padded and DWORD aligned; refer to the source to examine how to navigate the buffer.

typedef struct _OBJECT_TYPE_INFORMATION { UNICODE_STRING TypeName; ULONG TotalNumberOfHandles; ULONG TotalNumberOfObjects; }OBJECT_TYPE_INFORMATION, *POBJECT_TYPE_INFORMATION; typedef struct _OBJECT_TYPE_INFORMATION { UNICODE_STRING TypeName; ULONG TotalNumberOfHandles; ULONG TotalNumberOfObjects; }OBJECT_TYPE_INFORMATION, *POBJECT_TYPE_INFORMATION; inline bool ObjectListCheck() { typedef NTSTATUS(NTAPI *pNtQueryObject) (HANDLE, UINT, PVOID, ULONG, PULONG); POBJECT_ALL_INFORMATION pObjectAllInfo = NULL; void *pMemory = NULL; NTSTATUS Status; unsigned long Size = 0 ; pNtQueryObject NtQO = (pNtQueryObject)GetProcAddress( GetModuleHandle( TEXT( " ntdll.dll" ) ), " NtQueryObject" ); Status = NtQO(NULL, 3 , &Size, 4 , &Size); pMemory = VirtualAlloc(NULL, Size, MEM_RESERVE | MEM_COMMIT, PAGE_READWRITE); if (pMemory == NULL) return false ; Status = NtQO((HANDLE)- 1 , 3 , pMemory, Size, NULL); if (Status != 0x00000000) { VirtualFree(pMemory, 0 , MEM_RELEASE); return false ; } pObjectAllInfo = (POBJECT_ALL_INFORMATION)pMemory; unsigned char *pObjInfoLocation = ( unsigned char *)pObjectAllInfo->ObjectTypeInformation; ULONG NumObjects = pObjectAllInfo->NumberOfObjects; for (UINT i = 0 ; i < NumObjects; i++) { POBJECT_TYPE_INFORMATION pObjectTypeInfo = (POBJECT_TYPE_INFORMATION)pObjInfoLocation; if (wcscmp(L " DebugObject" , pObjectTypeInfo->TypeName.Buffer) == 0 ) { if (pObjectTypeInfo->TotalNumberOfObjects > 0 ) { VirtualFree(pMemory, 0 , MEM_RELEASE); return true ; } else { VirtualFree(pMemory, 0 , MEM_RELEASE); return false ; } } pObjInfoLocation = ( unsigned char *)pObjectTypeInfo->TypeName.Buffer; pObjInfoLocation += pObjectTypeInfo->TypeName.Length; ULONG tmp = ((ULONG)pObjInfoLocation) & - 4 ; pObjInfoLocation = (( unsigned char *)tmp) + sizeof ( unsigned long ); } VirtualFree(pMemory, 0 , MEM_RELEASE); return true ; }

"Dumping", a special term used in the reverse engineering realm, describes the process of taking an executable that has been protected and after the executable has been decrypted into memory, taking what is essentially a snapshot of the program and saving it onto disk, as shown by the following diagram. There are many techniques to prevent the dumping of an executable file that has been encrypted or compressed, and the following techniques are some of the more popular or better documented methods.

Nanomites

I find this technique to be very interesting, and it works by replacing certain branch instructions, Jcc instructions, within a program with INT 3 breakpoints; information about the removed jumps is then stored in a heavily encrypted table. This information includes the destination of the jump, the required CPU flags, and the size of the jump (normally, either two or five bytes). Then, by using self-debugging, when one of these breakpoints is hit, the debugging process will handle the debug exception, and will look up certain information about the debugging break. This information is whether or not the breakpoint is a Nanomite or a real debug breakpoint, whether or not the jump should be taken (this includes comparing the EFLAGS registers appropriately for the jump type, i.e.., JNZ needs ZF = 0). If it is, then the address of the jump is retrieved and the execution in the debuggee will resume from there, and if it is not, then the length of the replaced jump is retrieved, and the size of that specific jump is skipped in the debuggee, and execution resumes. Now, to make things even worse, random INT 3 instructions will be placed at unreachable parts of codes, and entries will be made in the Nanomite table. There are even entries that do not have a corresponding INT 3 but are placed there to annoy reverse engineers. When used in the correct places inside an executable, this technique is very powerful, and has almost no impact on performance. Unfortunately, the source code for this technique is far too complicated for this article.

Stolen Bytes (Stolen Code)

This is a technique introduced by ASprotect, and can be entertaining to someone who has never encountered the method. In the stolen bytes routine, code or bytes from the original process protected by the packer are removed, often from the OEP (Original Entry Point), and are encrypted somewhere inside the packing code. The area where the bytes are is then replaced with code that will jump to a dynamically allocated buffer that contains the decrypted bytes that were "stolen" from the original code; this buffer also contains a jump back to the appropriate address of execution. More often than not, both the area were the bytes were removed from and the dynamically allocated buffer where the original bytes reside are filled with junk code and even more anti-reverse engineering techniques. This can be a powerful technique if the underlying concept is hidden from the reverse engineer; otherwise, it's not too hard to fix.

SizeOfImage

SizeOfImage is a field in IMAGE_OPTION_HEADER of a PE file, and by increasing the size of this field in the PEB at runtime, we can cause problems for tools that weren't developed to handle this problem. This method is easily applied to an application, and is easily defeated by reversing applications by enumerating all pages with the MEM_IMAGE flag, starting at the application's ImageBase page. This works because there cannot be a gap in the pages in memory. Here is some sample code:

void ChangeSizeOfImage(DWORD NewSize) { __asm { mov eax , fs:[0x30] mov eax , [ eax + 0x0c] mov eax , [ eax + 0x0c] mov dword ptr [ eax + 0x20], NewSize } }

Virtual Machines

Virtualization is considered the future of anti-reverse engineering, and has very much already made it into the present. Protectors like Themida and VMProtect already use virtual machines in their protection schemes. The simple use of virtual machines isn't the extent of the technique, however. Themida, for example, uses a technology that creates a unique virtual machine for every protected executable that utilizes the virtual machine protection. By implementing virtual machines this way, Themida prevents the use of a generic attack against its virtualization protection. Also, many protection schemes that utilize virtual machines often implement junk code instructions in their virtual machine byte code, much like junk code is inserted in native IA-32 code (Ferrie).

Guard Pages

Packers and protectors can utilize guard pages to implement, what is in essence, an on-demand decryption/decompression system. When the executable is loaded into memory instead of decompressing/decrypting the entire contents of the file at run-time, the protector will simply mark all pages that were not immediately needed as guard pages. After that is done, when another section of code or data is needed, an EXCEPTION_GUARD_PAGE (0x80000001) exception will be raised, and the data can be decrypted or decompressed either from file or from encrypted/compressed contents in memory.

This technique has been implemented in two ways, one by hooking KiUserUserDispatcher (Shrinker), and by using Self-Debugging (Armadillo's CopyMemoryII) (Ferrie). In the case of Shrinker, when an exception is raised, a check is made, from the hook placed on KiUserUserDispatcher , to find where the exception occurred and if the exception occurred in the process space of the protected executable; if it was, then the contents will be decompressed from the disk into the page where the program is either expecting data or executable code. Utilizing this technique can significantly reduce loading times and reduce memory usage for an executable file, because only pages that are needed are ever backed by physical memory (RAM), and only the pages that need to be used are decompressed and decrypted.

Armadillo also implements this technique under the name of CopyMemII, and it behaves in a similar fashion with the exception that it requires the use of self-debbugging; also, instead of having empty pages and loading the pages from disk, CopyMemII simply decompresses the pages into memory. Note that this does not decrypt the pages; therefore, the code and data is still secure. Then, when a page that has not been decrypted is accessed, an EXCEPTION_GUARD_PAGE (0x80000001) exception will be raised, and the process that is the debugger will catch the exception and decrypt the page as needed. There is, however, a weakness in Armadillo's implementation of the technique, and that is that once a page is decrypted, it will stay decrypted in memory. By exploiting this weakness, a reverse engineer could force the process to touch every page needed by the program and leave the entire program decrypted in memory and in perfect shape for a dump. In both implementations of this technique, the processes will only decrypt or decompress a page at a time; therefore, if an access spans more than one page, the protector will simply allow for the next exception to occur and decrypt that page. As a final note, if the protectors were to remember the last accessed page and were to discard or erase the last used page before decrypting the next page, then this technique would be extremely powerful.

Removing the Portable Executable Header

This is a simple anti-dumping technique that removes an executable's portable executable from memory at runtime; by doing this, a dumped image would be missing important information such as the RVA (Relative Virtual Address) of important tables (Reloc, Import, Export etc..), the entry point, and other information that the Windows loader needs to utilize when loading an image. One would want to be careful when utilizing this technique, because the Windows API or maybe legitimate external programs may need access to this information which has been removed.

inline void ErasePEHeaderFromMemory() { DWORD OldProtect = 0 ; char *pBaseAddr = ( char *)GetModuleHandle(NULL); VirtualProtect(pBaseAddr, 4096 , PAGE_READWRITE, &OldProtect); ZeroMemory(pBaseAddr, 4096 ); }

IA-32 Instruction Exploits

The following techniques take advantage of the problems debuggers have dealing with the IA-32 instructions. Most of these methods are low level techniques that aren't used very often.

Interrupt 2D

The INT 2D instruction can be used as a general purpose debugger detection method, because when executing the instruction, if no debugger is present, an exception will occur. However, if a debugger is present, no exception will occur, and things get interesting based on the debugger you are using. OllyDBG, as shown in the diagram, will actually skip a byte in its disassembly and will cause the analysis to go wrong. Visual Studio 2008's debugger handles the instruction without issues, and as for other debuggers, we would have to test ourselves.

inline bool Int2DCheck() { __try { __asm { int 0x2d xor eax , eax add eax , 2 } } __except (EXCEPTION_EXECUTE_HANDLER) { return false ; } return true ; }

Stack Segment

By manipulating the stack segment through the use of push ss and pop ss , we can cause the debugger to execute instructions unwillingly. In the following function, when stepping over the code with any debugger, the mov eax, 9 line will execute, but will not be stepped on by the debugger.

inline void PushPopSS() { __asm { push ss pop ss mov eax , 9 xor edx , edx } }

Instruction Prefixes

The following technique takes advantage of the way debuggers handle instruction prefixes. When stepping over this code in OllyDBG or in Visual Studio 2008, we will reach the first emit and immediately be taken to the end of the __try block. What happens is that the debugger essentially skips over the prefix and handles the INT 1 . When running this code without a debugger, there will be an exception that SEH will catch and the program will continue along.

inline bool IsDbgPresentPrefixCheck() { __try { __asm __emit 0xF3 __asm __emit 0x64 __asm __emit 0xF1 } __except (EXCEPTION_EXECUTE_HANDLER) { return false ; } return true ; }

OllyDBG Specific

The following techniques can be used specifically to attack OllyDBG, which is probably the most used debugging tool on Windows at the moment. There are more techniques used to detect OllyDBG than what I show here, and probably for every technique discussed in this article, there is a plug-in for OllyDBG that fixes the issue. However, for inexperienced reverse engineers, these tricks can still work.

FindWindow

Using the Win32 API function FindWindow , we can check for the existence of OllyDBG's window class, OLLYDBG , and if it does exist, then most likely, OllyDBG is open and waiting to attach to a process, or is actively debugging the current process or another process.

HANDLE hOlly = FindWindow(TEXT("OLLYDBG"), NULL); if(hOlly) ExitProcess(0);

OutputDebugString Exploit

In the world of exploits, there are many ways to exploit a program's security measures or lack thereof, and OllyDBG does have one. It's a format string exploit that has been patched by various custom versions of OllyDBG, but exists in the normal unmodified version which is the prevalent version of OllyDBG. The following code will crash OllyDBG if it is currently attached to the process, and is a very powerful technique.

OutputDebugString( TEXT("%s%s%s%s%s%s%s%s%s%s%s") TEXT("%s%s%s%s%s%s%s%s%s%s%s%s%s") TEXT("%s%s%s%s%s%s%s%s%s%s%s%s%s") TEXT("%s%s%s%s%s%s%s%s%s%s%s%s%s") );

WinDBG Specific

The following technique can be used to detect if WinDBG is running on the host machine. There hasn't been much research done on detecting WinDBG, because reverse engineers tend to favor other debuggers and analysis tools over WinDBG.

FindWindow

This is exactly the same technique as shown above in the OllyDBG example, except with a different window class and works the same way.

HANDLE hWinDbg = FindWindow(TEXT("WinDbgFrameClass"), NULL); if(hWinDbg) ExitProcess(0);

Other Techniques

The following techniques didn't really fit into the other categories I covered in the previous section, and since they shared that in common, I'm putting them all into their own unique section.

Junk Code

Junk code is an aptly named technique of code obfuscation, and as its name suggests, it utilizes code that is junk or not needed to confuse a reverse engineer as to what the current code is actually trying to accomplish. When the junk code that is inserted into a routine is convincing and successfully manages to confuse a reverse engineer, then this technique can be rather effective; however, there is a performance penalty for utilizing this technique because the more instructions a routine or function contains, the longer the function will take to complete. Another issue utilizing junk code is that for memory and stack manipulation operations like push , pop , and mov ptr [] , there is a decent chance for stack or memory corruption; therefore, these instructions are either placed and utilized carefully, or not used at all. Here is an example of a routine that adds two numbers and subtracts one, but has junk code added in.

#define JUNK_CODE_ONE \ __asm {push eax } \ __asm {xor eax , eax } \ __asm {setpo al } \ __asm {push edx } \ __asm {xor edx , eax } \ __asm {sal edx , 2 } \ __asm {xchg eax , edx } \ __asm {pop edx } \ __asm {or eax , ecx } \ __asm {pop eax } inline int AddSubOne( int One, int Two) { JUNK_CODE_ONE return ( (One + Two) - 1 ); }

As we can see, this routine does a lot of nothing, and only the final two instructions actually accomplish the goal of the function.

Native Code Permutations

Permutations is defined as "often major or fundamental change (as in character or condition) based primarily on rearrangement of existent elements", which when referring to the world of code means different ways of accomplishing the same goal or task ("permutation"). For those unfamiliar with the concept of permutation, I'm going to explain it first with numbers, and then we'll explore the concept with code.

Permutations of the set {1,5,9} would be:

159 195 519 591 915 951

When we do permutation of an item or object, we're simply trying to represent the same information or actions in a different way. Now, we'll do the permutation of the mov m32, imm instruction:

Original: mov [mem addr], 7 (mov m32, imm) Permutation 1: push 7 pop [mem addr] Permutation 2: mov eax , mem addr mov [ eax ], 7 Permutation 3: mov edi , mem addr mov eax , 7 stosd Permutation 4: push mem addr pop edi push 7 pop eax stosd And on....

As we can imagine, this can be a very powerful and flexible method of obfuscation and confusion, especially considering that a lot of people are simply following tutorials when reversing, and even small changes will completely prevent them from reversing your application. Then, when this concept is applied to an executable at runtime, and if at every run the program is permutated and morphed, we can achieve metamorphic code which, along with virtual machines, is the cream of the crop of anti-reverse engineering techniques. However, this technique is very hard to implement correctly, considering an excellent disassembly engine is required, and for those who are familiar with the hell of creating an accurate disassembler, this is quite a task.

Citations