I have recently came across (well, not entirely by myself… cheers Nahuel!) a fairly (un)common problem related to performing ring0-to-ring3 transitions, after a successful kernel vulnerability exploitation. As I have managed to come up with a bunch of possible solutions, and even write example code for some of these, today I would like to present my thoughts, together with some brief explanation.

Introduction

Before trying to find a reliable solution to the problem, it should be clearly stated first. And so, we are considering a 32-bit Windows NT-family version (one of the supported ones), suffering from a stack-based buffer overflow inside one of the system call handler functions. The attacker is able to overwrite memory placed after a fixed-size buffer, including the stack frame, return address, syscall arguments and anything else reachable from this point. As opposed to the reality, we assume that there is no stack protection (i.e. a cookie) implemented, so the security flaw can lead straight into malicious code execution and system compromise. Furthermore, the overflow is triggered right inside the syscall handler, not a nested function of any kind.

The following ascii picture, presenting the stack layout at the time of the overflow, should give you a better insight of the described scenario:

+-----------------------+ | | | local variables (1) | | | +-----------------------+ | CHAR buf[32] | -+ +-----------------------+ | | | | | local variables (2) | | overflow | | | direction +-----------------------+ | | stack frame | | +-----------------------+ v | return address | +-----------------------+ | | | syscall parameters | | | +-----------------------+ | | | KiFastCallEntry stack | | | | (...) |

So, here we are; able to control roughly any value, which could lead us into code execution… a perfect dream for every vulnerability researcher. There is one more requirement, however – we must, by any means, return to user-mode, in order to exit the exploit process in a legitimate way (such as using ExitProcess). So, how do we achieve it, assuming that the original values of the return address, and possibly some of the syscall arguments are lost (due to being overwritten by attacker-supplied data)? Let’s find out, what the options are.

KiFastCallEntry and KiServiceExit

Under normal system execution (i.e. when its stability and security don’t collapse), each system call handler – such as NtOpenFile – returns to its original caller, the KiFastCallEntry function. This routine, in turn, is a dispatcher most often used upon the sysenter instruction being utilized by ring-3 code (however, it is also used by kernel modules, when taking advantage of system services). After calling an adequate handler from KeServiceDescriptorTable, the dispatcher is supposed to lower the processor privilege level, by returning to where the syscall instruction was triggered.

The latter part of the job is implemented by the KiServiceExit routine, responsible for coming back to the service caller, whatever it is. Interestingly enough, KiFastCallEntry doesn’t need to call the exit function, thanks to a specific assembly code layout, designed by the system developers:

+-----------------------+ | nt!KiFastCallEntry | | | --+ | /* code */ | | | CALL EBX | <-|-- EBX = syscall handler address | | | |-----------------------| | | nt!KiServiceExit | | | | | code execution direction | /* code */ | v | SYSEXIT | | |

As the KiServiceExit implementation directly follows the “end” of KiFastCallEntry, the code execution automatically moves from one routine, into another. This way, no actual call instruction is required, as the smart layout causes KiServiceExit to always execute after returning from the syscall handler. Due to the fact, that by exchanging the original return address with the one pointing at our shellcode, we do not land inside KiServiceExit automatically, anymore. What makes the situation even worse, is the fact that the exit routine is an internal symbol, not publicly exported to other, ring-0 modules.

Considering the above conditions, finding a reliable way of returning into user-mode might appear to be somewhat problematic. The next couple of sections aim to show the bright and dark sides of some possible solutions, which I have been able to think of – if there is something I have apparently missed, please let me know – I will be glad to extend the article with additional material ;)

Obtaining internal kernel symbols

The first, and probably most straight-forward solution one could think of, requires the attacker to recognize the precise version of the kernel image being used, and take advantage of symbols’ packages, publicly available on Microsoft servers. An adequate package could be either downloaded at run-time (provided that the attacked machine is connected to internet at the time of the exploitation), or distributed together with the malicious application. A lighter version of the latter option could rely on hard-coding the KiServiceExit function addresses, for every single kernel image version possible.

Advantages: If the exploit was taking advantage of legitimate, Microsoft-supplied symbols or using a static table of supported Windows editions together with the desired kernel addresses, it could achieve a decent level of reliability. If one knows the KiServiceExit memory placement, there isn’t much left to be done – just aligning the stack as it would be upon a normal syscall return, and jumping to the routine after the payload completes.

Disadvantages: In case the attacker decided to download a complete ntosknrl.exe symbol file from the web, he could probably put the entire operation at risk, as the the .pdb file being retrieved can be as large as 5MB (or more). The exploit could obviously employ various DKOM-style techniques, in order to hide the connection; this would only work for the local machine, though – how about other computers in the network, and/or devices along the way to the global net? The attacker could be either caught in the first place, or leave significant amounts of proof for the forensics researchers.

If, in turn, the attacker went towards using hardcoded-values, he would be forced to keep his exploit up-to-date, in the context of new system patches being released along the way.

Problems of the above nature are, obviously, not an issue, if the attacker has a relatively small number of targets, and is able to figure out the computers’ kernel versions by other means (i.e. having a local account on a given machine would usually help a lot).

Signature scan

Another, well-known way of retrieving the address-of-whatever relies on performing a quick & dirty signature scan of the memory. In this particular case, one would have to scan the entire ntoskrnl.exe image memory area, in search of a previously-extracted signature, unique for the KiServiceExit routine. The signature could (or probably: should) be constructed so that it would work for every operating system out there, or be kept inside a hard-coded table of supported kernel versions (as mentioned in the previous section).

Advantages: The exploit doesn’t have to establish any outgoing connections. In fact, it doesn’t make use of the internet, at all. Depending on the length and quality of the signature, as well as the numbers of kernel modifications applied by Microsoft, this technique could turn out to be either reliable, or the very opposite. According to the author, it is usually best to consider signature-scanning unreliable, regardless of the conditions. If, however, the attacker proved that the KiServiceExit address can be easily obtained, using a signature valid for all existing systems and is unlikely to change – I would claim such solution to be a relatively good one.

Disadvantages: As far as my experience goes, using constant signatures is rarely a good idea, especially if there are other options to pick. The exploit developer can be never certain that Microsoft doesn’t unexpectedly change the kernel code, stack layout, or anything affecting the function assembly being relied on. What is worse, the problem is not only about changing the KiServiceExit contents itself – it is enough that a new byte sequence, matching the existing pattern appears anywhere in the kernel image; and the exploit is fooled. Concluding – not a recommended technique, when it comes to my opinion.

Own KiServiceExit implementation

The next solution to be considered, would require the exploit developer to create their own implementation of the exit routine, rather than keep trying to (non-deterministically) find it’s virtual address in memory. This is possible because of the fact that we’re executing with the same rights as the kernel itself, and are able to use any privileged instruction it uses. The only problem here could be potentially caused by the complexity of the function – fortunately, it is not the case for KiServiceExit.

Advantages: The major upside of this method, resides in the fact that we are not dependent on virtual addresses of any kind (apart from the actual payload, which might require these). In other words, it is possible to implement one payload epilogue, and use it across numerous system versions, as long as the stack layout (most importantly – the trap frame) doesn’t change. According to my observations, the KiServiceExit routine either doesn’t change at all, or is changed in minor parts (i.e. single instructions). Even though there might be a few differences between Windows 2000 and Windows Vista; such low-level parts of the system aren’t modified in one day. And so, carefully preparing one, separate implementation of the function for each Windows NT-family release (2000, XP, Vista, 7) should be sufficient to keep the reliability on a very high level.

Disadvantages: One actual drawback, which could be pointed out is that the solution is still not as elegant, as it could possibly be. That’s due to the fact that the kernel-to-user transition is being performed, using highly undocumented (except for the

tos\ke\i386\trap.asm file, present inside the Windows Research Kernel package) system behavior and internal offsets. As a consequence, even though it is very likely that someone’s implementation of the exit routine will work on any build of a specific Windows version, there is no certainty about it – especially in the context of future Windows versions.

The KeUserModeCallback technique

Last, but not least – the technique that was my first thought, when I started reflecting on the problem. Since the mechanism taken advantage of, in this method, has been already described numerous times (such as the “KeUserModeCallback utilization” section of mxatone’s article, or Nynaeve’s post), I will only give a brief explanation of its concept.

Under normal conditions, ring-3 code can only interact with the kernel modules via system calls (regular interrupts are mostly deprecated, while call-gates are not used, at all). This basic scheme relies on the fact, that user applications send specific requests, asking the kernel either to perform operations, which require higher processor privileges, or to be supplied with necessary information. A request is made (via the INT 2E or sysenter instruction), kernel dispatches the requests and possibly returns some information – then comes back to user mode (via either iretd or sysexit). Following the above scheme, one could consider system calls to be a specific type of callback functions – whenever an application wants to interact with the system, it calls back an adequate function from the kernel.

As it turns out, the kernel might want to call back into user-mode, as well! More precisely, the standard graphical driver (win32k.sys), needs to use ring-3 routines in numerous situations; in order to send notifications about graphical events going on, or to request some information. In order to meet the requirements, a special interface called user-mode callbacks was developed inside the NT kernel. The interface actually consists of one public, and a few internal kernel routines:

NTSTATUS KeUserModeCallback ( IN ULONG ApiNumber, IN PVOID InputBuffer, IN ULONG InputLength, OUT PVOID *OutputBuffer, IN PULONG OutputLength );

By using the above function, exported by ntoskrnl.exe, the graphical module is able to perform a legitimate ring-0 into ring-3 transition. What happens next, is that some basic information regarding the execution state is stored on the kernel stack, and the execution is passed to the user-mode ntdll.KiUserCallbackDispatcher function, of the following prototype:

VOID KiUserCallbackDispatcher( IN ULONG ApiNumber, IN PVOID InputBuffer, IN ULONG InputLength );

The dispatcher is then responsible for forwarding the execution into one of the callback routines (the EDX register contains the ApiNumber parameter):

mov eax, large fs:18h mov eax, [eax+30h] mov eax, [eax+2Ch] call dword ptr [eax+edx*4]

Seemingly, the user-side dispatch table is pointed to by one of the PEB (Process Environment Block) fields. After the given callback completes its task, it resumes the win32k.sys execution by either using a dedicated interrupt (INT 2D, internally called KiCallbackReturn), or triggering the NtCallbackReturn system call. The question is – how does the above information help us achieve the desired exploitation effect?

Thanks to the fact that KeUserModeCallback is a public symbol, any active module running in kernel-mode can call the function in a fully reliable manner. What is more, we can also hook the KiUserCallbackDispatcher function, or better yet – redirect the dispatch table pointer, residing inside PEB. If we perform the above steps, we become able to trigger our own, fully controlled, kernel-to-user transitions. Thanks to the clever NT kernel, we don’t really have to care about what is left on the kernel stack, as it will be gracefully cleaned up, upon the process termination. Below, you can find example code snippets, responsible for accomplishing each stage of the safe kernel-to-user transition:

Loading the graphical library – before we decide to touch any of the win32-related PEB fields, we should make sure that the user32.dll library has been previously loaded. This way, we are guaranteed, that both the user- and kernel- parts of the system graphics are correctly initialized for our process. LoadLibraryA("user32.dll"); Replace the original dispatch table pointer, with the one controlled by us. LPVOID GetFSBase(void) { LDT_ENTRY ldt; GetThreadSelectorEntry(GetCurrentThread(), GetFS(), &ldt); return (LPVOID)(ldt.BaseLow | (ldt.HighWord.Bytes.BaseMid << 16) | (ldt.HighWord.Bytes.BaseHi << 24)); } (...) for( i=0;i<DISPATCH_TABLE_SIZE;i++ ) DispatchTable[i] = CallbackHandler; BYTE* Teb = GetFSBase(); Teb = *(DWORD*)(Teb+0x18); Teb = *(DWORD*)(Teb+0x30); *(DWORD*)(Teb+0x2C) = DispatchTable; Retrieve the nt!KeUserModeCallback address. This step can be achieved, by taking advantage of the PSAPI interface (to retrieve the ImageBase of the kernel image; EnumDeviceDrivers and GetDeviceDriverBaseNameA are of much use), loading the very same image in the context of our application, and performing some simple maths. I have made use of my personal GetKernelProcAddress function this time – implementing this one is left as an exercise to the reader ;) KeUserModeCallback = (typeof(KeUserModeCallback))GetKernelProcAddress("ntoskrnl.exe","KeUserModeCallback"); Trigger the buffer overflow, leading to the Payload() function being executed. Shellcode represents the actual code for elevating user privileges, starting up a reverse shell, or whatever else you can think of. VOID Payload() { ((VOID(*)())Shellcode)(); KeUserModeCallback(0,0,0,0,0); } Catch the user-mode callback inside CallbackHandler(), and gracefully terminate the process. DWORD CallbackHandler() { if(b0f_triggered) ExitProcess(); NtCallbackReturn(0,0,ERROR_SUCCESS); return ERROR_SUCCESS; } That’s it, we’re done!

What should be eventually noted, is that the KeUserModeCallback leads to the KiServiceExit function in the end, as the following call chain shows:

| nt!KeUserModeCallback | nt!KiCallUserMode v nt!KiServiceExit

Let’s take a closer look at the actual pros and cons of the presented technique.

Advantages: The entire solution basically relies on two steps: calling a public nt!KeUserModeCallback routine after successful exploitation, and “catching” the execution flow at the public ntdll!KiUserCallbackDispatcher function, or at one of the callback handlers, pointed to by the PEB. Seemingly, both steps can be accomplished in a fully reliable way, as long as Microsoft decides to either completely remove one of the utilized functions, or make it an internal symbol. Since such a scenario is highly unlikely, we can safely assume that the technique is, and will be perfect for returning into user-code from difficult situations (such as a seriously damaged stack).

Disadvantages: One, possible disadvantage that comes into my mind, is that replacing the PEB pointer, containing the dispatch table might not be as easy as one might suppose. Due to the fact that high PEB offsets are likely to change between different Windows versions, the attacker should take this fact into consideration when planning a world-wide, cross-version attack. This downside doesn’t change anything though, as it is possible to disrupt the execution yet inside the exported KiUserCallbackDispatcher, as mentioned before. If you know about any other drawbacks I am not aware of, please let me know.

Why so serious (about ring-3)?

Looking at the above text, one might wonder, why the problem is stated so that the kernel-to-user transition must take place, when it doesn’t have to under normal circumstances. The answer is – because. When it comes to kernel-mode, there are bunches of bunches of possible scenarios, machine states, and other factors which sometimes can be predicted, and sometimes not; returning to user-mode might be the best choice, at times. One should keep in mind, however, that there are ways to terminate the current process from within ring-0 (such as nt!ZwTerminateProcess). Or better yet – once code execution is achieved, the process could simply load a regular rootkit driver (hiding the existence of the process), and remain in the idle state until machine reboot, by infinitely calling nt!ZwYieldExecution.

Conclusion

In this post, I aimed at presenting yet another, interesting scenario related to the kernel exploitation field, with a couple of possible solutions. Even thought situations of the described nature don’t tend to happen very often, they do. Besides that, all four techniques are directed towards universality, so they can be used not only when a stack-based buffer overflow takes place, but whatever kind of situation when it is hard, or impossible to resume the original track of kernel code execution. So, that’s it… comments are welcome, as always! ;)

Have fun.