Background

This post takes a different approach from the others and delves into the world of the Windows kernel. Specifically, it will cover how to access the undocumented APIs that are present within the kernel (ntoskrnl). If you trace a Windows API call from usermode to the kernel, you will find the endpoint to be something similar to what is shown below (Win 8 x64):

public NtOpenFile NtOpenFile proc near 4C 8B D1 mov r10 , rcx B8 31 00 00 00 mov eax , 31h 0F 05 syscall C3 retn NtOpenFile endp public NtOpenFile NtOpenFile proc near 4C 8B D1 mov r10, rcx B8 31 00 00 00 mov eax, 31h 0F 05 syscall C3 retn NtOpenFile endp

where the r10 register holds the value of the first argument and eax holds the index into the Windows internal syscall table. A note should be made that this is specific to a x64 operating system running a native x64 application. x86 systems rely on going through KiFastSystemCall in ntdll to achieve invoking a syscall, and WOW64 emulation relies on making transitions from x64 to x86 and back and setting up an appropriate stack in-between. When the syscall instruction executes, the flow of code will eventually find itself to NtOpenFile in ntoskrnl. This is actually a wrapper around IopCreateFile (shown below):

public NtOpenFile NtOpenFile proc near 4C 8B DC mov r11 , rsp 48 81 EC 88 00 00 00 sub rsp , 88h 8B 84 24 B8 00 00 00 mov eax , [ rsp + 88h + arg_28 ] 45 33 D2 xor r10d , r10d 4D 89 53 F0 mov [ r11 - 10h ] , r10 C7 44 24 70 20 00 00 00 mov [ rsp + 88h + var_18 ] , 20h 45 89 53 E0 mov [ r11 - 20h ] , r10d 4D 89 53 D8 mov [ r11 - 28h ] , r10 45 89 53 D0 mov [ r11 - 30h ] , r10d 45 89 53 C8 mov [ r11 - 38h ] , r10d 4D 89 53 C0 mov [ r11 - 40h ] , r10 89 44 24 40 mov [ rsp + 88h + var_48 ] , eax 8B 84 24 B0 00 00 00 mov eax , [ rsp + 88h + arg_20 ] C7 44 24 38 01 00 00 00 mov [ rsp + 88h + var_50 ] , 1 89 44 24 30 mov [ rsp + 88h + var_58 ] , eax 45 89 53 A0 mov [ r11 - 60h ] , r10d 4D 89 53 98 mov [ r11 - 68h ] , r10 E8 48 E2 FC FF call IopCreateFile 48 81 C4 88 00 00 00 add rsp , 88h C3 retn NtOpenFile endp public NtOpenFile NtOpenFile proc near 4C 8B DC mov r11, rsp 48 81 EC 88 00 00 00 sub rsp, 88h 8B 84 24 B8 00 00 00 mov eax, [rsp+88h+arg_28] 45 33 D2 xor r10d, r10d 4D 89 53 F0 mov [r11-10h], r10 C7 44 24 70 20 00 00 00 mov [rsp+88h+var_18], 20h 45 89 53 E0 mov [r11-20h], r10d 4D 89 53 D8 mov [r11-28h], r10 45 89 53 D0 mov [r11-30h], r10d 45 89 53 C8 mov [r11-38h], r10d 4D 89 53 C0 mov [r11-40h], r10 89 44 24 40 mov [rsp+88h+var_48], eax 8B 84 24 B0 00 00 00 mov eax, [rsp+88h+arg_20] C7 44 24 38 01 00 00 00 mov [rsp+88h+var_50], 1 89 44 24 30 mov [rsp+88h+var_58], eax 45 89 53 A0 mov [r11-60h], r10d 4D 89 53 98 mov [r11-68h], r10 E8 48 E2 FC FF call IopCreateFile 48 81 C4 88 00 00 00 add rsp, 88h C3 retn NtOpenFile endp

Again it should be noted that there was a lot of hand-waving going on here, and that the syscall instruction does not simply invoke the native kernel API, but goes through several routines responsible for setting up trap frames and performing access checks before arriving at the native API implementation.

Exported native kernel APIs for use in drivers also follow a similar, but nowhere near as complex mechanism. Every Zw* function in the kernel provides a thin wrapper around a call to the Nt* version (example shown below):

NTSTATUS __stdcall ZwOpenFile ( PHANDLE FileHandle , ACCESS_MASK DesiredAccess , POBJECT_ATTRIBUTES ObjectAttributes , PIO_STATUS_BLOCK IoStatusBlock , ULONG ShareAccess , ULONG OpenOptions ) ZwOpenFile proc near 48 8B C4 mov rax , rsp FA cli 48 83 EC 10 sub rsp , 10h 50 push rax 9C pushfq 6A 10 push 10h 48 8D 05 BD 2F 00 00 lea rax , KiServiceLinkage 50 push rax B8 31 00 00 00 mov eax , 31h E9 C2 DA FF FF jmp KiServiceInternal ZwOpenFile endp NTSTATUS __stdcall ZwOpenFile(PHANDLE FileHandle, ACCESS_MASK DesiredAccess, POBJECT_ATTRIBUTES ObjectAttributes, PIO_STATUS_BLOCK IoStatusBlock, ULONG ShareAccess, ULONG OpenOptions) ZwOpenFile proc near 48 8B C4 mov rax, rsp FA cli 48 83 EC 10 sub rsp, 10h 50 push rax 9C pushfq 6A 10 push 10h 48 8D 05 BD 2F 00 00 lea rax, KiServiceLinkage 50 push rax B8 31 00 00 00 mov eax, 31h E9 C2 DA FF FF jmp KiServiceInternal ZwOpenFile endp

This wrapper does basic things such as set up the stack, disable kernel interrupts (cli), and preserve flags. The KiServiceLinkage function is just a small stub that executes the ret instruction immediately. I have not had a chance to reverse it to see what purpose it serves — it was never even invoked when a breakpoint was set on it. Lastly, the syscall number (0x31) is put into eax and a jump to the KiServiceInternal routine is made. This routine, among other things, is responsible for setting the correct PreviousMode and traversing the Windows syscall table (commonly referred to as the System Service Dispatch Table, or SSDT) and invoking the native Nt* version of the API.

Getting Access to the APIs

So what is the relevance of all of this? The answer is that even though the kernel exports a ton of APIs for kernel/driver developers, there are still plenty of other ones which provide some pretty cool functionality — ones like ZwSuspendProcess/ZwResumeProcess, ZwReadVirtualMemory/ZwWriteVirtualMemory, etc, that are not available. Getting access to those APIs is really where this post begins. Before starting, there are several clear issues that need to be resolved:

The base address and image size in memory of the kernel (ntoskrnl) need to be found. This is obviously because the APIs lay somewhere within that memory region.

The syscalls need to be identified and there should be a generic way developed to allow us to invoke them.

Other issues related to using the APIs should be addressed. For example, process enumeration in the kernel in order to get a valid process handle for the target process in a ZwSuspend/ZwResume call.

Addressing these in order, the first point is relatively simple, but also relies on undocumented features. Getting the address of the kernel in memory is as simple as calling ZwQuerySystemInformation with the undocumented SYSTEM_INFORMATION_CLASS structure. What will be returned is a pointer to a SYSTEM_MODULE_INFORMATION structure containing a count of loaded modules in memory followed by the variable length array of SYSTEM_MODULE pointers. A quick note to add is that the NtInternals documentation on the structure is a bit outdated, and that the first two fields are of type ULONG_PTR instead of always a 32-bit ULONG. Finding the kernel base address and image size is simple a traversal of the SYSTEM_MODULE array and a substring search for the kernel name. The code is shown below:

PSYSTEM_MODULE GetKernelModuleInfo ( VOID ) { PSYSTEM_MODULE SystemModule = NULL ; PSYSTEM_MODULE FoundModule = NULL ; ULONG_PTR SystemInfoLength = 0 ; PVOID Buffer = NULL ; ULONG Count = 0 ; ULONG i = 0 ; ULONG j = 0 ; //For names for WinXP CONST CHAR * KernelNames [ ] = { "ntoskrnl.exe" , "ntkrnlmp.exe" , "ntkrnlpa.exe" , "ntkrpamp.exe" } ; //Perform error checking on the calls in actual code ( VOID ) ZwQuerySystemInformation ( SystemModuleInformation , & SystemInfoLength , 0 , & SystemInfoLength ) ; Buffer = ExAllocatePool ( NonPagedPool , SystemInfoLength ) ; ( VOID ) ZwQuerySystemInformation ( SystemModuleInformation , Buffer , SystemInfoLength , NULL ) ; Count = ( ( PSYSTEM_MODULE_INFORMATION ) Buffer ) -> ModulesCount ; for ( i = 0 ; i < Count ; ++ i ) { SystemModule = & ( ( PSYSTEM_MODULE_INFORMATION ) Buffer ) -> Modules [ i ] ; for ( j = 0 ; j < sizeof ( KernelNames ) / sizeof ( KernelNames [ 0 ] ) ; ++ j ) { if ( strstr ( ( LPCSTR ) SystemModule -> Name , KernelNames [ j ] ) != NULL ) { FoundModule = ( PSYSTEM_MODULE ) ExAllocatePool ( NonPagedPool , sizeof ( SYSTEM_MODULE ) ) ; RtlCopyMemory ( FoundModule , SystemModule , sizeof ( SYSTEM_MODULE ) ) ; ExFreePool ( Buffer ) ; return FoundModule ; } } } DbgPrint ( "Could not find the kernel in module list

" ) ; return NULL ; } PSYSTEM_MODULE GetKernelModuleInfo(VOID) { PSYSTEM_MODULE SystemModule = NULL; PSYSTEM_MODULE FoundModule = NULL; ULONG_PTR SystemInfoLength = 0; PVOID Buffer = NULL; ULONG Count = 0; ULONG i = 0; ULONG j = 0; //For names for WinXP CONST CHAR *KernelNames[] = { "ntoskrnl.exe", "ntkrnlmp.exe", "ntkrnlpa.exe", "ntkrpamp.exe" }; //Perform error checking on the calls in actual code (VOID)ZwQuerySystemInformation(SystemModuleInformation, &SystemInfoLength, 0, &SystemInfoLength); Buffer = ExAllocatePool(NonPagedPool, SystemInfoLength); (VOID)ZwQuerySystemInformation(SystemModuleInformation, Buffer, SystemInfoLength, NULL); Count = ((PSYSTEM_MODULE_INFORMATION)Buffer)->ModulesCount; for(i = 0; i < Count; ++i) { SystemModule = &((PSYSTEM_MODULE_INFORMATION)Buffer)->Modules[i]; for(j = 0; j < sizeof(KernelNames) / sizeof(KernelNames[0]); ++j) { if(strstr((LPCSTR)SystemModule->Name, KernelNames[j]) != NULL) { FoundModule = (PSYSTEM_MODULE)ExAllocatePool(NonPagedPool, sizeof(SYSTEM_MODULE)); RtlCopyMemory(FoundModule, SystemModule, sizeof(SYSTEM_MODULE)); ExFreePool(Buffer); return FoundModule; } } } DbgPrint("Could not find the kernel in module list

"); return NULL; }

The above function will return the PSYSTEM_MODULE corresponding to information about the kernel (or NULL in the failure case). Now that the base address and image size of the kernel are known, it is possible to begin coming up with a way to invoke the undocumented syscalls.

Since all of the undocumented Zw* calls are nearly identical wrappers (with the exception of the syscall number) invoking KiSystemService, I present the generic way of invoking these calls by creating a functionality equivalent template of this in kernel memory and executing off of that. The general idea is to create a blank template such as the one shown below:

BYTE NullStub = 0xC3 ; BYTE SyscallTemplate [ ] = { 0x48 , 0x8B , 0xC4 , /*mov rax, rsp*/ 0xFA , /*cli*/ 0x48 , 0x83 , 0xEC , 0x10 , /*sub rsp, 0x10*/ 0x50 , /*push rax*/ 0x9C , /*pushfq*/ 0x6A , 0x10 , /*push 0x10*/ 0x48 , 0xB8 , 0xAA , 0xAA , 0xAA , 0xAA , 0xAA , 0xAA , 0xAA , 0xAA , /*mov rax, NullStubAddress*/ 0x50 , /*push rax*/ 0xB8 , 0xBB , 0xBB , 0xBB , 0xBB , /*mov eax, Syscall*/ 0x68 , 0xCC , 0xCC , 0xCC , 0xCC , /*push LowBytes*/ 0xC7 , 0x44 , 0x24 , 0x04 , 0xCC , 0xCC , 0xCC , 0xCC , /*mov [rsp+0x4], HighBytes*/ 0xC3 /*ret*/ } ; BYTE NullStub = 0xC3; BYTE SyscallTemplate[] = { 0x48, 0x8B, 0xC4, /*mov rax, rsp*/ 0xFA, /*cli*/ 0x48, 0x83, 0xEC, 0x10, /*sub rsp, 0x10*/ 0x50, /*push rax*/ 0x9C, /*pushfq*/ 0x6A, 0x10, /*push 0x10*/ 0x48, 0xB8, 0xAA, 0xAA, 0xAA, 0xAA, 0xAA, 0xAA, 0xAA, 0xAA, /*mov rax, NullStubAddress*/ 0x50, /*push rax*/ 0xB8, 0xBB, 0xBB, 0xBB, 0xBB, /*mov eax, Syscall*/ 0x68, 0xCC, 0xCC, 0xCC, 0xCC, /*push LowBytes*/ 0xC7, 0x44, 0x24, 0x04, 0xCC, 0xCC, 0xCC, 0xCC, /*mov [rsp+0x4], HighBytes*/ 0xC3 /*ret*/ };

in non paged memory, patch in the correct addresses (NullStub replacing KiServiceLinkage), patch in the syscall, then invoke KiSystemService (here done by moving the 64-bit absolute address on the stack and returning to it). Once fully patched at runtime, this data can simply be cased to the appropriate function pointer and invoked like normal. Here is the allocation and patching routine:

PVOID CreateSyscallWrapper ( IN LONG Index ) { PVOID Buffer = ExAllocatePool ( NonPagedPool , sizeof ( SyscallTemplate ) ) ; BYTE * NullStubAddress = & NullStub ; BYTE * NullStubAddressIndex = ( ( BYTE * ) Buffer ) + ( 14 * sizeof ( BYTE ) ) ; BYTE * SyscallIndex = ( ( BYTE * ) Buffer ) + ( 24 * sizeof ( BYTE ) ) ; BYTE * LowBytesIndex = ( ( BYTE * ) Buffer ) + ( 29 * sizeof ( BYTE ) ) ; BYTE * HighBytesIndex = ( ( BYTE * ) Buffer ) + ( 37 * sizeof ( BYTE ) ) ; ULONG LowAddressBytes = ( ( ULONG_PTR ) KiSystemService ) & 0xFFFFFFFF ; ULONG HighAddressBytes = ( ( ULONG_PTR ) KiSystemService >> 32 ) ; RtlCopyMemory ( Buffer , SyscallTemplate , sizeof ( SyscallTemplate ) ) ; RtlCopyMemory ( NullStubAddressIndex , ( PVOID ) & NullStubAddress , sizeof ( BYTE * ) ) ; RtlCopyMemory ( SyscallIndex , & Index , sizeof ( LONG ) ) ; RtlCopyMemory ( LowBytesIndex , & LowAddressBytes , sizeof ( ULONG ) ) ; RtlCopyMemory ( HighBytesIndex , & HighAddressBytes , sizeof ( ULONG ) ) ; return Buffer ; } PVOID CreateSyscallWrapper(IN LONG Index) { PVOID Buffer = ExAllocatePool(NonPagedPool, sizeof(SyscallTemplate)); BYTE *NullStubAddress = &NullStub; BYTE *NullStubAddressIndex = ((BYTE *)Buffer) + (14 * sizeof(BYTE)); BYTE *SyscallIndex = ((BYTE *)Buffer) + (24 * sizeof(BYTE)); BYTE *LowBytesIndex = ((BYTE *)Buffer) + (29 * sizeof(BYTE)); BYTE *HighBytesIndex = ((BYTE *)Buffer) + (37 * sizeof(BYTE)); ULONG LowAddressBytes = ((ULONG_PTR)KiSystemService) & 0xFFFFFFFF; ULONG HighAddressBytes = ((ULONG_PTR)KiSystemService >> 32); RtlCopyMemory(Buffer, SyscallTemplate, sizeof(SyscallTemplate)); RtlCopyMemory(NullStubAddressIndex, (PVOID)&NullStubAddress, sizeof(BYTE *)); RtlCopyMemory(SyscallIndex, &Index, sizeof(LONG)); RtlCopyMemory(LowBytesIndex, &LowAddressBytes, sizeof(ULONG)); RtlCopyMemory(HighBytesIndex, &HighAddressBytes, sizeof(ULONG)); return Buffer; }

Example usage of this is again shown below:

typedef NTSTATUS ( NTAPI * pZwSuspendProcess ) ( IN HANDLE ProcessHandle ) ; pZwSuspendProcess ZwSuspendProcess = ( pZwSuspendProcess ) CreateSyscallWrapper ( 0x017A , 1 ) ; //This can then be invoked as normal, e.g, ZwSuspendProcess(x); typedef NTSTATUS (NTAPI *pZwSuspendProcess)(IN HANDLE ProcessHandle); pZwSuspendProcess ZwSuspendProcess = (pZwSuspendProcess)CreateSyscallWrapper(0x017A, 1); //This can then be invoked as normal, e.g, ZwSuspendProcess(x);

However, before doing that, the address of KiServiceInternal needs to be found so it can be properly patched in. This is, after all, partially why finding the base address of the kernel was important. This is done through scanning for the function signature through the entirely of ntoskrnl’s memory. The signature must be sufficiently long as to be unique, but preferably not so long that comparisons take a lot of time. The signature that I used for this example is shown below:

typedef VOID ( * pKiSystemService ) ( VOID ) ; pKiSystemService KiSystemService ; NTSTATUS ResolveFunctions ( IN PSYSTEM_MODULE KernelInfo ) { CONST BYTE KiSystemServiceSignature [ ] = { 0x48 , 0x83 , 0xEC , 0x08 , 0x55 , 0x48 , 0x81 , 0xEC , 0x58 , 0x01 , 0x00 , 0x00 , 0x48 , 0x8D , 0xAC , 0x24 , 0x80 , 0x00 , 0x00 , 0x00 , 0x48 , 0x89 , 0x9D , 0xC0 , 0x00 , 0x00 , 0x00 , 0x48 , 0x89 , 0xBD , 0xC8 , 0x00 , 0x00 , 0x00 , 0x48 , 0x89 , 0xB5 , 0xD0 , 0x00 , 0x00 , 0x00 , 0xFB , 0x65 , 0x48 , 0x8B , 0x1C , 0x25 , 0x88 , 0x01 , 0x00 , 0x00 } ; KiSystemService = ( pKiSystemService ) FindFunctionInModule ( KiSystemServiceSignature , sizeof ( KiSystemServiceSignature ) , KernelInfo -> ImageBaseAddress , KernelInfo -> ImageSize ) ; if ( KiSystemService == NULL ) { DbgPrint ( "- Could not find KiSystemService

" ) ; return STATUS_UNSUCCESSFUL ; } DbgPrint ( "+ Found KiSystemService at %p

" , KiSystemService ) ; //.... } ... ... PVOID FindFunctionInModule ( IN CONST BYTE * Signature , IN ULONG SignatureSize , IN PVOID KernelBaseAddress , IN ULONG ImageSize ) { BYTE * CurrentAddress = 0 ; ULONG i = 0 ; DbgPrint ( "+ Scanning from %p to %p

" , KernelBaseAddress , ( ULONG_PTR ) KernelBaseAddress + ImageSize ) ; CurrentAddress = ( BYTE * ) KernelBaseAddress ; for ( i = 0 ; i < ImageSize ; ++ i ) { if ( RtlCompareMemory ( CurrentAddress , Signature , SignatureSize ) == SignatureSize ) { DbgPrint ( "+ Found function at %p

" , CurrentAddress ) ; return ( PVOID ) CurrentAddress ; } ++ CurrentAddress ; } return NULL ; } typedef VOID (*pKiSystemService)(VOID); pKiSystemService KiSystemService; NTSTATUS ResolveFunctions(IN PSYSTEM_MODULE KernelInfo) { CONST BYTE KiSystemServiceSignature[] = { 0x48, 0x83, 0xEC, 0x08, 0x55, 0x48, 0x81, 0xEC, 0x58, 0x01, 0x00, 0x00, 0x48, 0x8D, 0xAC, 0x24, 0x80, 0x00, 0x00, 0x00, 0x48, 0x89, 0x9D, 0xC0, 0x00, 0x00, 0x00, 0x48, 0x89, 0xBD, 0xC8, 0x00, 0x00, 0x00, 0x48, 0x89, 0xB5, 0xD0, 0x00, 0x00, 0x00, 0xFB, 0x65, 0x48, 0x8B, 0x1C, 0x25, 0x88, 0x01, 0x00, 0x00 }; KiSystemService = (pKiSystemService)FindFunctionInModule(KiSystemServiceSignature, sizeof(KiSystemServiceSignature), KernelInfo->ImageBaseAddress, KernelInfo->ImageSize); if(KiSystemService == NULL) { DbgPrint("- Could not find KiSystemService

"); return STATUS_UNSUCCESSFUL; } DbgPrint("+ Found KiSystemService at %p

", KiSystemService); //.... } ... ... PVOID FindFunctionInModule(IN CONST BYTE *Signature, IN ULONG SignatureSize, IN PVOID KernelBaseAddress, IN ULONG ImageSize) { BYTE *CurrentAddress = 0; ULONG i = 0; DbgPrint("+ Scanning from %p to %p

", KernelBaseAddress, (ULONG_PTR)KernelBaseAddress + ImageSize); CurrentAddress = (BYTE *)KernelBaseAddress; for(i = 0; i < ImageSize; ++i) { if(RtlCompareMemory(CurrentAddress, Signature, SignatureSize) == SignatureSize) { DbgPrint("+ Found function at %p

", CurrentAddress); return (PVOID)CurrentAddress; } ++CurrentAddress; } return NULL; }

Once the ResolveFunctions() function executes, the CreateSyscallWrapper function is ready to be used as shown above. This will now resolve any syscall that you wish to call.

An Example

The code below is an example I wrote up showing how to write into the virtual address space of a target process. This process is given by name to the OpenProcess function, which retrieves the appropriate EPROCESS block corresponding to the process and opens a handle to it. This handle is then used in conjunction with the undocumented APIs associated with process manipulation (ZwSuspendProcess/ZwResumeProcess) and virtual memory manipulation (ZwProtectVirtualMemory/ZwWriteVirtualMemory). An internal undocumented function (PsGetNextProcess) is also scanned for and retrieved in order to help facilitate process enumeration. The code was written for and tested on an x86 version of Windows XP SP3 and x64 Windows 7 SP1.