Overview

To continue learning important topics within the OS and architecture, and before diving into the deep end of the application, we’re going to cover a topic that is relevant to reverse engineering and development in general: exceptions and interrupts. In this article, you’ll learn about exceptions/interrupts from the ground up. What they are, the differences in types of exceptions, interrupt delivery, how they’re used to debug, and how we can leverage a variety of exceptions when reverse engineering. As usual, it is assumed that the reader has a background in a compiled programming language like C, CPP, Rust, et al. However, if you have experience in Java or some other object-oriented language and are familiar with the concept of handling software-based exceptions you should be able to pick up on this as well. We’ll be referencing the Intel and AMD software development manuals often. It’s important to remember that this series serves as a guide to reverse engineering on a Windows OS, and how to think about reverse engineering. All skills learned can be taken and applied to other systems.

All demos are performed on Windows 10 Version 2004; Build 19035. (This build is not required. Having Windows 10 will be sufficient.)

Disclaimer All projects are written with Visual Studio 2019, and compiled using Intel C++ Compiler. All optimizations are turned off to reduce the number of obscure assembly listings due to compiler optimizations complicating comprehension. The software exception handling mechanisms (SEH, VEH) researched and documented in later sections are only present on this OS. If you’re on Linux, there will be related links to read about exception handling in the recommended reading section. You will be able to apply the same logic to those of other operating systems.

All that being said, let’s get into it…

Exceptions

— What is an exception?

An exception is defined as an event that is generated by the processor when one or more errors are encountered while executing a program. There are exceptions that are predefined in both hardware and software, and both the delivery and handling of these exceptions vary based on the level encountered. Speaking from a high level, we know that when an exception is generated the software stops execution of the application and signals that an error condition has been hit. If you’ve programmed in C/C++ you’re maybe familiar with the __try/__except or try/catch blocks. These are the language constructs that utilize the underlying exception handling mechanisms provided by the OS and hardware to handle errors. The code checks for an error condition throws an exception if the condition exists, and then it does a few things to process an exception. However, that handler can be used in a variety of ways, and it’s useful to know how to reverse engineer them since many malicious actors take advantage of their obscurity in disassembly to perform “covert” operations in the handler. However, to be able to work backward from a piece of code we have no prior knowledge on it’s best to understand the mechanisms and how they operate. We’re going to look at the different types of exceptions, as well as interrupts, their delivery, and processing at both a high and low level. Then we’ll break down examples in the OS-specific portion of this article and reverse engineer them to hijack exception handlers and disable anti-debugging mechanisms nested within them. Along the way, we’ll explore how both exception handling mechanisms work, the differences, and their usefulness for various types of indirect operations.

We first need to address some technical details before learning about the higher-level constructs.

— Interrupts

In most modern architectures there are two different methods of interrupting a program during runtime. One of which is an interrupt, the other is an exception. An interrupt is, at a high-level, an asynchronous event that’s usually generated by some external device. On Windows, there will be various interrupts that occur during the execution of a program. Typically, you shouldn’t encounter many exceptions during runtime. However, the interrupt and exception are handled relatively the same in that the current processor stops executing the program and begins execution in the specific event handler. Things begin to get a little hairy when you discuss interrupt and exception handlers, and it’s important to differentiate between high-level exception handlers used in SEH/VEH versus the OS handlers in the Interrupt Descriptor Table (IDT). I find it’s easier to start with the mind-melting stuff first and then move to the higher-level functionality. Let’s speak generally about interrupts and exceptions and their delivery mechanism – all abstractions aside.

Imagine you’re running a text editor on your computer. You’re taking notes for a meeting, and as you hit each individual key new characters appear in the file. I’m sure you’re aware of how a keyboard works, and how it writes to the actual editor, but have you thought about how the computer knows what key you pressed, that a key was pressed, and how it communicates with the keyboard to say that X was pressed – or being held down in combination with a key? We’re going to take this standard everyday task and break down what is going on under the hood to better understand interrupts.

On each keypress, your keyboard controller – which is just a device that links the computer and keyboard – generates an interrupt. This interrupt is commonly called the keyboard interrupt and is used to signal to the processor that a key has been pressed. The processor stops execution of the task, accesses an entry in the Interrupt Descriptor Table (IDT) and executes the handler associated with that entry in the IDT. Once the execution is complete and the interrupt has been handled properly control is restored to the interrupted task. Remember, that interrupts are typically driven by an external device hence the keyboard example. Before we continue with this example we need to address what the IDT is, how it’s structured, and how the OS leverages it to service these types of interrupts. To do that we have to discuss a few types of tables. This is going to be a little bit of a headache, but if you can manage you’ll come out on the other side a wiser engineer.

— Descriptor Tables

We’re only going to cover the necessary tables, but if you’re interested in learning the ins and outs of every type of descriptor and table associated with the architecture you can refer to the recommended reading. In the Intel/AMD architecture, there are two kinds of descriptor tables. The first is the Global Descriptor Table, and the second is the Local Descriptor Table. We’re going to be focusing on the first table since LDT’s are not typically used within the scope of this discussion. So what is the Global Descriptor Table (GDT)? The GDT is a table that is defined for every system, it’s a requirement, and is used for all programs in the system. It’s simply an array of system descriptor entries. These system descriptor entries can vary in type. There is the call-gate descriptor, IDT-gate descriptor, and some other types like the LDT and TSS descriptor. The IDT-gate descriptor is the one we’re interested in. To help visualize this GDT’s structure I refactored a diagram from the Intel SDM. Recall a GDT is an array/table of system descriptors, one of which is the IDT descriptor.

As mentioned above, we have the GDT displayed as a table of system descriptor entries. Each of those descriptor entries is 16-bytes in size while operating in 64-bit mode. To be technically correct I included the unused entry in the GDT, which is the first entry of every GDT you’ll encounter. It’s often referred to as the null descriptor, but its purpose is a lesson for a different post. If we continue analyzing the diagram you’ll notice the box with GDTR pointing to the GDT. This is the GDT register, which contains the base address and limit of the GDT on your system. The limit of the GDT is a multiple of the size of the descriptors inside of it and in 64-bit operation, these descriptors are expanded to 16-bytes, so the limit is 16(N)-1 where N is the number of entries in the GDT. There are differences in the structure and size of descriptors in different processor modes, but that is for reading on your own. The most important thing to note here is the segment descriptor at +48. The interrupt descriptor table that is created for the system has a field in its definition called the segment selector that points to the segment descriptor in the GDT (think of it as an index into the array). The interrupt gate present in the IDT (not pictured) provides an offset into the linear address space where the interrupt service routine exists. This interrupt service routine is the procedure that executes to properly handle the generated interrupt. The IDT is similar to the GDT in that it’s a system descriptor table, and is an array of 16-byte descriptors in 64-bit mode.

There are some minor things to note when we’re thinking of 64-bit processor operation, and they’ll be mentioned below the next diagram.

This diagram shows how the GDT, and IDT are used to locate the proper interrupt service routine. You’ll notice there is an interrupt vector that is referencing the interrupt gate in the IDT. This is because the IDT associates each exception/interrupt with a number that is referred to as the interrupt vector. This vector is associated with a gate descriptor for the interrupt service routine for that specific interrupt. If you recall, I mentioned the GDT and IDT are similar in structure, however, the IDT uses the exception/interrupt number to index into the table and access the gate descriptor. If we look at this diagram we want to read left to right. We start with an interrupt vector from a generated interrupt being delivered to the IDT, which is then scaled with 16 to index into the array and get the gate descriptor. After getting the interrupt gate descriptor we use the segment selector to index into the GDT to get the segment descriptor and pull the base address out. In 64-bit mode, segments are all based from 0 as segmentation is essentially disabled to create a flat linear address space. This makes sense as the interrupt gate in the IDT is scaled the 16-bytes and holds a 64-bit offset to the interrupt service routine. This offset is placed in the instruction pointer (RIP) which then leads execution to start in the interrupt handler.

Terms to recognize! I’m using IDT-gate, gate descriptor, and system descriptor interchangeably. Interrupt service routine (ISR) and interrupt handler are the same as well.

The IDT has a limit, and if you’ve read any form of operating system book or looked at interrupt vectors for various service routines for hardware (like a mouse or keyboard) you’ll notice that the IDT only supports 256 interrupt vectors. There can be less than 256 vectors but no more. This is a design decision that has some history and will be in the recommended reading. Now, if all this gave you a slight headache don’t sweat it – we’re gonna go back to our example and walk through the process.

— Interrupt Example Continued

So we’re back, typing away in our text-editor, and each time a key is pressed an interrupt is generated to inform our keyboard controller that a key is being pressed. Let’s say the interrupt vector associated with the keyboard is 34. We chose 34 because interrupts 0-31 are reserved for Intel/AMD, and this is hypothetical. Take a look at the diagram below to understand the routing of an interrupt from when it’s generated to when it’s serviced.

Let’s walk through this. On key press, keyboard interrupt is generated and is delivered via interrupt vector #34 (it’s just a number), it’s scaled by 16 because that’s the size of each interrupt gate in the IDT (34*16), then used to index into IDT to get the proper gate descriptor. The interrupt gate descriptor has a segment selector associated and is used to index into the GDT to find the segment descriptor for this IDT-gate. The base address is 0 because we have a flat linear address space and no segmentation during 64-bit operation, so we take 0 + the 64-bit offset that is maintained in the interrupt gate in the IDT and set RIP to that result. RIP will then be pointing to the proper interrupt service routine (in this case #34) and execute it. Once execution completes the processor restores the context of the interrupted task and resumes execution of that task. This all happens asynchronously and without the loss of program/task flow unless some sort of error was encountered in the service routine. As a general flow of execution that’s all there is to it.

You’ve gone through the hardest part of this and that’s understanding the architectural layout of and function of the IDT, interrupt gate, and GDT. The rest will build on top of this knowledge, and the great thing is that exceptions operate through the same mechanism. In the next few subsections we’re going to cover the different types of exception classifications, the architecturally defined interrupts, and identify a few that you may already be familiar with but not know it. We’ll continue on by differentiating between sources and then address the OS facilities for exceptions like structured-exception handling and vectored-exception handling. After that, you’ll learn how to modify and access the different records for these facilities and use them to your advantage.

Did you know?

The top 4 bits of the IDT index is the current IRQL. The IRQL is the interrupt request level and the processor will raise the processor IRQL if required to properly handle the interrupt.

— Exception Classifications

In an earlier section, we described what an exception is and then detailed how they and interrupts are delivered to the proper procedure, but we didn’t talk about the different types of exceptions. As we know exceptions are events generated when the processor determined some error condition is met while executing instructions. There are three types of exceptions, and their reporting and restoration mechanism varies based on this type. We’re only concerned with two of the three types, and will only be describing them below. If you’re interested in the third and for more details please see Intel SDM Chapter 6.5.

The first type of exception we’re interested in is a trap. A trap is just an exception that is reported following the execution of a trapping instruction. As an example, let’s consider cpuid and pretend on normal hardware it is a trapping instruction. This means that once cpuid is executed it will trap into the handler, execute the code in the trap handler, and resume execution on the instruction following the cpuid. Trapping can be sort of difficult to think about so think of quite literally as if you’re walking down a sidewalk (the instruction stream) and encounter a hole (the trapping instruction), you fall in and have to climb out on the other side of the hole (the trap handler), and now you’re on the other side of the hole (on the next instruction after the trapping instruction).

For your viewing pleasure I went ahead an illustrated how to think about trapping instructions if the description didn’t help. I don’t think that the diagram does well, but writing is hard work. The next type of exception is a fault. A fault is an exception that requires correction to properly restore control flow. This type is much different than a trap in that when a fault has reported the state of execution is restored to a state prior to faulting instruction execution. That’s kind of backward to think about for some, so think of it as a game of hopscotch. You have a pattern you have to jump in order, and if you mess up you stop and go back to the start as opposed to the next instruction like how a trap works. To give a realistic example, consider the following assembly:

mov rax, [rbx] dec rax lea rbx, [r9] mov rax, [rbx] <--- fault ############## FAULT HANDLED ############### mov rax, [rbx] dec rax lea rbx, [r9] mov rax, [rbx] <--- resumes execution and restores state from beginning of this instruction

We have a series of instructions of no particular importance, but you see the instruction mov rax, [rbx] is causing a faulting exception. This means that after running the fault handler, execution will resume at the beginning of the faulting instruction and restore processor state to the state it was at the beginning to allow for the instruction to execute again. This is a very common type of exception due to a frequently occurring exception called a page fault. We’ll cover that in a bit, however. An interesting note about faulting instructions is that the return address for the fault handler points to the faulting instruction, and this is how control is restored to the erroring task.

Alright, so now that you know the two common exception classifications we can move on to the architecturally defined hardware exceptions and talk about a few that you’ve likely encountered while reverse engineering, debugging, or just running a modern OS.

— Architecturally Defined Exceptions and You

If you recall from the earlier there are a number of predefined interrupt vectors by the architecture, particularly 0-31 are reserved for Intel/AMD. There is an excerpt below from a project of mine that lists out the various exception and interrupt vectors that are architecturally defined. We’ll only be discussing 3 of these exceptions in detail, the rest have details that can be found in the Intel SDM Chapter 6.2 Vol. 3A.

{ VEC_0, DE, "Divide Error", Fault, NO_EC }, { VEC_1, DB, "Debug Exception", FaultTrap, NO_EC }, { VEC_2, NMI, "NMI Interrupt", Interrupt, NO_EC }, { VEC_3, BP, "Breakpoint", Trap, NO_EC }, { VEC_4, OF, "Overflow", Trap, NO_EC }, { VEC_5, BR, "Bound Range Exceeded", Fault, NO_EC }, { VEC_6, UD, "Invalid Opcode", Fault, NO_EC }, { VEC_7, NM, "No Math Coprocessor", Fault, NO_EC }, { VEC_8, DF, "Double Fault", Abort, EC_ZERO }, { VEC_9, NA, "Segment Overrun", Fault, NO_EC }, { VEC_10, TS, "Invalid TSS", Fault, EC }, { VEC_11, NP, "Segment Not Present", Fault, EC }, { VEC_12, SS, "Stack Segment Fault", Fault, EC }, { VEC_13, GP, "General Protection", Fault, EC }, { VEC_14, PF, "Page Fault", Fault, EC }, { VEC_15, NA, "Intel Reserved", None, NO_EC }, { VEC_16, MF, "Math Fault", Fault, NO_EC }, { VEC_17, AC, "Alignment Check", Fault, EC_ZERO }, { VEC_18, MC, "Machine Check", Abort, NO_EC }, { VEC_19, XM, "SIMD FP Exception", Fault, NO_EC }, { VEC_20, VE, "Virtualization Exception", Fault, NO_EC }, { VEC_21, CP, "CP Exception", Fault, EC },

The first member of these structure definitions is the vector number. Remember that 0 to 31 are reserved for Intel/AMD definition. The DE/DB/NMI/etc designations are the mnemonics for the exception that is a shorthand way of identifying it. You’ll sometimes see #GP(0) which is a general-protection fault with error code (0). As you can see that’s delivered via interrupt vector 13. If you’ve been reading through the list you might notice a few exceptions that sound familiar. Most notably the Debug Exception (#DB), Breakpoint Exception (#BP), and possible the Page-Fault Exception (#PF). We’ll talk about these in this order. If you’re unfamiliar with reverse engineering terminology or the concept of software breakpoints/hardware breakpoints it may be helpful to read this anyways, but you can skip it and come back later after we introduce the tools and their usage in the next article.

Interrupt Descriptor Table Usage The IDT is used when a hardware interrupt, software interrupt, or processor exception is generated. All of these are noted as interrupts. Software exceptions (excluding INT N instructions) are handled by a high-level facility like SEH/VEH and do not use the IDT.

— Debug Exception (#DB)

This exception behaves differently based on the condition specified in one of the architectural debug registers (DR6). It can act as a fault or trap exception. This type of exception is typically the kind used for hardware debugging, or when enabling a hardware breakpoint on some condition. The conditions could be on data read or write, instruction fetch, or the typical single step.

Debug Register Conditions There are other conditions that can be used to generate this type of exception.

It’s interesting to note only two of the conditions result in fault-like behavior, and all the others behave in a trapping manner. The only two faulting conditions are breakpoint on instruction fetch and general-detect condition. Recall that a fault means that state is reverted back to when the faulting instruction was executing, and that a trap sets the state to the instruction after the trapping instruction. You will encounter hardware breakpoints as we dive deeper into RE targets, and this knowledge will come in handy.

How could knowledge of interrupt delivery be helpful in a defensive/offensive system?

An interesting behavior in some open-source hypervisors is that they don’t deliver the #DB exception on the proper instruction boundary when CPUID is executed with the trap flag set in the EFLAGS register. The interrupt will be delivered on the instruction following the instruction after CPUID thus giving a system the ability to detect a virtualized environment.

We’ll go over this exception again later, a brief overview is sufficient for now.

— Breakpoint Exception (#BP)

The breakpoint exception is very common, and if you’ve been programming for some time you’ve likely encountered it when debugging a misbehaving program. This exception has trap-like behavior and is used when a debugger sets a breakpoint. This breakpoint is enabled by replacing the first byte of an instruction with the int 3 instruction. This works because the int 3 instruction is one byte long which makes replacing and restoring trivial. You’re well aware by now how a trapping exception behaves, but if you’re interested in visualizing this behavior create a project and make a simple hello world application. Place a breakpoint on the print statement and observe how the debugger behaves when you break and resume. Look at the registers during the break and you’ll see how RIP points to the next instruction, but not on a faulting instruction.

— Page-Fault Exception (#PF)

Modern operating systems take advantage of a mechanism for memory management called a page-fault. This is an interesting exception because it occurs frequently without the user being aware. If you are unfamiliar with paging or virtual memory in a modern operating system I strongly suggest reading about them using the recommended reading links before continuing with this subsection. If you’re familiar with paging and virtual memory, but maybe not how page faults work then read on! A page-fault is easily classified since the behavior is part of its description – it’s a fault type of exception. A page-fault exception occurs and delivers an error code along with it. This error code is placed on the stack and encodes specific information within it such as if the fault occurred because of a permission bit being cleared, or the present bit being 0, among others. The most typical reason for a page-fault exception is when the processor detects that memory access was performed on a page-table entry that is not present in physical memory. This could be either because the data was paged out to disk due to memory management facilities (typical), or that the page no longer contains data and was freed by the operating system and marked as not present (this is not typical).

The first scenario mentioned above occurs quite often during normal system operations. Some data was paged out to disk in an effort to free up physical memory for an active task, the task switches and attempts to access the paged out memory, address translation mechanism checks the P (present) bit and determines if it is 0, and if so the processor will generate a #PF. Once the #PF is generated it will perform the steps detailed farther up when discussing the IDT and exception/interrupt delivery, call the page-fault (#PF) handler, and bring that memory back into physical memory so that the task attempting to access it may read the data properly. If the data is not found, or a page-fault does not occur your system will typically blue-screen and provide information about the error. A common issue is to encounter PAGE_FAULT_IN_NON_PAGED_AREA which means that there was an attempt to read memory in a region of memory that is exempt from paging where the memory is no longer resident. This results in a page-fault but it can’t be handled so the system saves what it can, and performs a bug check (blue-screen). This will most likely not happen with the software we’ll be looking at in this series, but it can (and often does) happen with poorly designed device drivers. We’ll look more into drivers toward the end of this series, and will debug a #PF error in a hardware monitoring driver.

Windows and Exceptions

So far we’ve covered a lot of information, some relevant and some useful for future articles. In this final section, we’re going to discover how Windows implements SEH and VEH to do exception handling in software. This software could be a driver or a user-mode application like Skype. The next section will start off with SEH, the design choices, implementation, and some details from under the hood. We’ll cover VEH in the same way, and then see how they both link with exception internals to handle exceptions. Once we understand how these facilities are used we’re going to look at a few ways to abuse them in an effort to hijack control flow, but not before we tie back to the IDT discussion from earlier with some software interrupt examples. The end of the article will also add some interesting ways to mask behavior through interrupt gate abuse.

— Structured Exception Handling

To start off with structured exception handling we need to address that SEH is used primarily to release resources if the program experiences a loss of continuity. If you’ve done any sort of C/C++ programming you’ve likely used it to handle problems like potential access violations, bad allocations, or determining if an object was found. We’re looking at this as an extension of the C language since it’s specifically designed for C. However, you can use it in C++ but it’s recommended by various sources to use the ISO-standard C++ exception handling facilities. MSDN is a great source for learning more in-depth information about SEH. I’m going to assume you have used SEH to some extent in projects and know the use of the mechanisms __try/__except and __try/__finally . If you’re unfamiliar with what happens when an exception is encountered we’ll walk through that below.

Let’s take a look at an example and then some disassembly of what’s underneath. Don’t worry if you don’t have the tools or gadgets, this is just for a walkthrough and to get your gears turning. I’ll make a brief example using SEH, we’ll walk through the example then toss it in IDA Pro.

__declspec( noinline ) void ThrowNullPointerDereferenceException( void ) { volatile int* ptr = 0x0; *ptr = 0x1337; } int main( int argc, char** argv ) { __try { ThrowNullPointerDereferenceException(); } __except ( EXCEPTION_EXECUTE_HANDLER ) { printf( "Caught Null Dereference.

" ); } return 0; }

In this example, you can see we wrap our potentially exceptional function in a __try block and set our __except handler to catch all types of exceptions. In the ThrowNullPointerDereferenceException we create a pointer and point it at nothing, then dereference it and attempt to write 1337h to the null location. This will clearly generate an exception and the __except block will execute.

That’s typical behavior, and not very interesting. Underneath this high-level abstraction, there is are complex processes at work. The most well-known term when thinking about exception handling is the process of stack unwinding. Let’s take a look at the example application in IDA Pro, and see if we can figure out what’s happening.

This is the main function pulled from our disassembler. Let’s go through and first identify everything so that we can begin to understand assembly somewhat. We have our function start where main proc near is declared. Following that, we have our stack allocation. If you hark back to the previous article on the stack you’ll immediately recognize that this is allocating stack space for the shadow store, and the return address via sub rsp, 28h . This is done to ensure proper stack alignment. The next instruction is a call to sub_1070 . Since we have prior knowledge of the application, we know this our ThrowNullPointerDereferenceException procedure. We’re going to follow it anyways to take a look.

This is our function that throws the exception. You can see that mov dword ptr ds:0, 1337h is the instruction that dereferences memory location ds:0 and attempts to store 1337h in that location. This line will immediately signal that there is an issue and raise an exception. But how does it know to call the proper handler? At this point, things get complicated as software exceptions take advantage of OS facilities provided via ntdll. The program raises an exception, and the SEH facilities go to work calling the appropriate API to resolve the error and locate the correct handler (if any). We’re going to use the IDA Pro Local Windows Debugger to trace the path of execution and figure out how this works.

We start the debugger and set a breakpoint on our first instruction in main so we can control things from the start. I’m going to step over until I execute the first call instruction, and we’ll see where we end up.

Immediately upon attempting to write to an invalid memory location our program generates an exception and our OS facilities go to work. The function we land in following the execution of the problem instruction is KiUserExceptionDispatcher . This function is responsible for invoking the user-mode SEH dispatcher. When an exception occurs in user-mode the kernel takes control briefly to determine if the exception occurred in user-mode or not. If it occurred during user execution then it modifies a trap frame that is pushed onto the stack so that when it returns from the interrupt/exception is winds up at KiUserExceptionDispatcher . In this way, software exceptions are trapping exceptions. If you’re not sure why then recall that faulting instructions attempt correction and then re-attempt the problem instruction. When software exceptions occur the kernel modifies the trap frame so that the program resumes execution in the user-mode SEH handling facilities.

Trap Frame Data

A trap frame is a structure passed to the kernel that contains state information of the currently executing program (registers, eflags, etc.) at the time of an exception that way control can be returned after the exception or interrupt has been serviced.

The kernel also places a CONTEXT parameter and an EXCEPTION_RECORD parameter that describe the state of the application when the exception was generated. This allows the handler to do things like read the error code, determine state information like what general-purpose register values were, and so forth. Once continuity is restored in KiUserExceptionDispatcher the exception is processed in RtlDispatchException. You can see a call to that function in the above image. This function is the internal implementation of the user-mode SEH dispatcher. The internals of RtlDispatchException are quite complex but to simplify things it uses the context and exception record parameters to locate what’s called a “function table entry” in a dynamic function table. A dynamic function table is used to store unwind information and the functions they’re associated with in an effort to help the OS properly unwind the call stack. A call stack is a list of functions that have been invoked in the current program. In this example, the call stack currently looks like this after entering RtlDispatchException .

The function that performs the lookup of this unwind information is RtlLookupFunctionEntry . It takes the instruction pointer from the context (the address where the exception occurred), the image base of the application, and creates what’s called a history table. The details of the history table are quite a whole other post in itself, so the main takeaway is that this table is used in the next call to RtlUnwindEx which begins the unwinding of function call frames.

To simplify a lot of these terms that may be unknown I’ll break this down. A call frame is a frame of information that is pushed onto the program stack that represents a call to a function and any parameter data supplied. From the previous article on the stack, we know that the return address is pushed onto the stack, followed by shadow store, and then there may be allocations made for local variables as well. This makes up the frame. The registers rbp and rsp are typically used to outline the size of the call frame where rbp is the base of the frame and rsp is the top.

Continuing our introspection of the SEH dispatcher I want to keep things as simple as possible. Following the execution of RtlUnwindEx a series of other calls are made to calculate virtual address of the unwind information structure associated with the function. It then performs a call to the registered language handler for the call frame, most often the _C_specific_handler, which internally traverses all exception handling records (which are __try/__except structures). It uses RtlUnwindEx to find each frame of the unwind in the specific exception records associated with your application, and – long story short – unwinds until it encounters an end frame in the unwind where it identifies itself as the final point in the unwind operations. It then restores the execution context of the erroring task via RtlRestoreContext and jumps to the exception handlers address by pushing it onto the stack along with the EFLAGS and current segment selector. It performs the jump by executing an iretq which pops the handler address into rip , restoring EFLAGS, and popping the selector into its respective register.

This is all very useful to know, and there is so much more on the internals of SEH that you can read about from the recommended reading. The most important reason is recognizing that these exception handlers are located in your applications address space. The OS facilities have to have these records from somewhere, and they aren’t stored in any magical location. Knowing where these functions look inside of your application will help lead you to their location (if obscured). In our case, there is a .pdata section otherwise known as the runtime information array that stores all the unwind info for a specific application. If we take a look at our application again in IDA (where I previously removed helpful comments) you’ll recognize some things I’ve mentioned in the explanation above.

IDA does a great job of providing useful information to us, and in this case, it helps us identify where the unwind information is located. If we follow the references of the __C_specific_handler we wind up in the .pdata section at our UNWIND_INFO structure for this specific function.

The above image displays the unwind information construct that is associated with our main function. There is a header, a structure that contains the offset where the __try block begins and the frame register, an RVA to the exception handler, and a scope table structure. The scope table structure defines the beginning of the try block, the beginning of the except, a handler value which is 1 for EXCEPTION_EXECUTE_HANDLER, and the target address where the except block begins and should be executed. So how does this help us?

— Taking Advantage of Exception Records

If a target application utilizes SEH in this manner, and you can locate an area where an exception occurs you can use what you know to locate the handler(s) and potentially hijack execution. It’s nothing fancy, but it can be used. SEH exploits have been used for ages, and this is one way to modify their targets or redirect execution to code an attacker wants to run. The method mentioned above would be useful for static modification where an attacker appends code in an execution section to the application, calculates the RVA and overwrites the target address in the scope table. I don’t want to introduce too many new topics in this post but in our examples when reversing some low-level anti-cheats we will employ more advanced SEH exploits that involve the TEB and abuse of VEH.

Content Removed I’ve since removed some information to maintain focus in this article to just the basics. We will cover more in future sections.

— Software Interrupts

At this point, we’ve covered a lot but there is something interesting to know about interrupts – software interrupts (aka “traps”) more specifically. When an interrupt is encountered we know that the CPU halts execution, saves state, and jumps to a predefined location where a handler routine is located. When handling is complete it resumes execution at the next instruction. So what happens when we perform a special software interrupt like int 3 ? This is commonly known as the debug breakpoint, or debug trap instruction. Pulling from the Intel SDM we note:

The INT 3 instruction generates a special one byte opcode (CC) that is intended for calling the debug exception handler. (This one byte form is valuable because it can be used to replace the first byte of any instruction with a breakpoint, including other one byte instructions, without over-writing other code).

The implementation of breakpoints is super simple now that we know how the IDT works. If an application encounters an int 3 instruction it issues an interrupt signal on vector 3 (breakpoint exception). The int instruction allows a user-mode process to issue signals on a few different vectors. The creation of the IDT and the interrupt gates must be done properly in order to prevent potentially problematic interrupts and exceptions being signaled from a user-mode process. We don’t really want unprivileged code being able to signal a #DF exception. There are specific fields in the interrupt gates that prevent this sort of behavior. The one to note is the descriptor privilege level field (DPL). This field is checked against the current privilege level (CPL) by the processor to determine if the interrupt was allowed, and if not the processor will raise a general protection fault (#GP). This can be done by setting the DPL of interrupt gates is 0 signifying that only usage from CPL 0 (kernel mode) can safely execute. I bet you’ve guessed it, but for specific software interrupts like int 3, 2E, or 1 the DPL of their interrupt gates is equal to the DPL of user-mode – 3. When the user-mode process executes the int 3 it performs actions just like mentioned in the earlier sections of this article.

You can read more about IDT implementation and the various mechanisms available to preserve system integrity in the recommended reading. For now, we’re done with our rundown of exceptions and interrupts.

Conclusion

In this article, we went over the architectural details of how interrupts and processor exceptions are handled as well as a brief overview of the differences between software exceptions and processor exceptions. You should be comfortable with the idea of the IDT, the classifications of exceptions on Intel and AMD, and the usage of software interrupts in debugging software. You’ve also learned how some software interrupts are delivered and the IDT utilizes specific fields in its descriptor to prevent unprivileged execution of certain interrupt vectors. The next article will be an accelerated introduction to assembly so that we can get going with the targeted reverse engineering projects. I plan to cover the most common instruction sequences you’ll encounter, demystify some of the obscure instructions, and provide many examples of their usage. A lot of the terminology used in the next article will be from the first article of the series, so be sure to brush up on the architecture fundamentals prior to digging into x64 assembly. I hope that this post taught you something new and interesting, and maybe gave you some ideas of your own to investigate. I highly suggest going through the recommended reading and absorbing as much detail as you can.

Thanks for reading, and as always if any part was confusing, needs clarification, or I missed something in the slew of words please don’t hesitate to reach out! Thank you for reading and best of luck!

Twitter: @daax_rynd

Recommended Reading