Siguza, 17. Aug 2018

KTRR

Allegedly “Kernel Text Readonly Region”.

Introduction

This post tries to detail the mechanism used in Apple’s A10 chips and later to prevent modification of an iOS kernel at runtime.

Older chips attempt to do this via a monitor program loaded in EL3, which is inherently flawed and bypassable though as detailed in “Tick Tock” by xerub.

Notes

On some sites you’ll see the term “memory-mapped registers” being thrown around. To avoid confusion I will not use that term, but instead:

“Register” to mean actual registers such as those accessed by the arm64 msr instruction. Each CPU core has its own copy of such registers, and the values held by them will be lost when the core goes to sleep.

instruction. Each CPU core has its own copy of such registers, and the values held by them will be lost when the core goes to sleep. “MMIO” (memory-mapped I/O) to mean a portion of the physical address that can be used to interact with a device on the SoC. Like regular RAM, this is global to the entire SoC and does not lose its values when any core goes to sleep, as it is a separate device.

Also note that code shown here will often have been shortened, expanding or removing macros that are assumed to be defined ( KERNEL_INTEGRITY_KTRR , MACH_BSD , __arm64__ ) or not ( DEVELOPMENT , DEBUG ).

Primitives

Besides abandoning EL3 altogether (leaving only EL1 and EL0), A10 chips introduce two new hardware features that serve as the corner stones for KTRR:

“RoRgn”

A piece of MMIO consisting of a “start”, “end” and “lock” fields. The “lock” field, when written to, will lock down all three fields, i.e. prevent any modification of them. The start and end fields hold numbers of pages in DRAM, spanning up the RoRgn. Inside this region, the memory controller will reject any and all attempted writes.

XNU has three macros to refer to them: #define rRORGNBASEADDR (*(volatile uint32_t *) (amcc_base + 0x7e4)) #define rRORGNENDADDR (*(volatile uint32_t *) (amcc_base + 0x7e8)) #define rRORGNLOCK (*(volatile uint32_t *) (amcc_base + 0x7ec))

A piece of MMIO consisting of a “start”, “end” and “lock” fields. The “lock” field, when written to, will lock down all three fields, i.e. prevent any modification of them. The start and end fields hold numbers of pages in DRAM, spanning up the RoRgn. Inside this region, the memory controller will reject any and all attempted writes. XNU has three macros to refer to them: “KTRR registers”

A set of three registers added to every CPU core, holding “low”, “high” and “lock” values respectively. The lock register, again, locks all three down. The low and high registers hold physical addresses, spanning up the executable range. If the CPU is at EL1 and the MMU is turned on, then any instruction fetch (i.e. trying to execute memory) outside of that range will fail, even if it is marked as executable in the page table. If the CPU is at EL0 or if the MMU is turned off, this range has no effect.

For those too, XNU has macros: #define ARM64_REG_KTRR_LOWER_EL1 S3_4_c15_c2_3 #define ARM64_REG_KTRR_UPPER_EL1 S3_4_c15_c2_4 #define ARM64_REG_KTRR_LOCK_EL1 S3_4_c15_c2_2

Also it is worth mentioning another feature that was present in past chips already, but is set up differently now:

“IORVBAR”

A piece of MMIO holding one field for every CPU, designating the physical address at which it will start executing on “reset” (basically when waking from sleep). This too has a locking mechanism, which is activated by writing to it a value that as its least significant bit set to 1.

On A9 CPUs and earlier this was set to a physical address inside TrustZone, where WatchTower (KPP) resides. Since A10, this is set to LowResetVectorBase as found in XNU, which iBoot calculates from the kernel’s entry point as outlined by this comment: /* * __start trampoline is located at a position relative to LowResetVectorBase * so that iBoot can compute the reset vector position to set IORVBAR using * only the kernel entry point. Reset vector = (__start & ~0xfff) */

In theory

The above primitives are virtually unbreakable due to their locking mechanism, but do by themselves not protect a running kernel. In order to look at what else is needed, we have to look a potential attacks:

(Asserted:) The primitives need to actually uphold what they promise. It must be impossible for an attacker to disable any protections or choose to not enable them, thus they must be set up and locked down before the attacker can gain code execution. It must be impossible for an attacker to patch any data you consider critical, so all of it needs to be inside the RoRgn. This sounds obvious in theory, but in practise a bunch of critical things are easily overlooked. It must be impossible for an attacker to patch page tables that are used to map memory from the RoRgn, otherwise they can copy said memory to a writeable location and patch page tables to point there instead. Such page tables must thus be inside the RoRgn themselves. It must be impossible for an attacker to employ a custom page table hierarchy, so the translation table base register that maps the kernel’s half of the address space ( ttbr1_el1 in this case) must be unchangeable. It must be impossible for an attacker to patch or inject any executable memory, otherwise they could add the instruction msr ttbr1_el1, x0 and gain the ability to change ttbr1_el1 . Therefore the “executable range” must be a strict subset of the RoRgn. It must be impossible for an attacker to turn off the MMU, since when the MMU is off all pages are considered executable and an attacker could again inject msr ttbr1_el1, x0 . The MMU status is controlled by the least significant bit of the register sctlr_el1 , so said register must to be unchangeable as well during normal operation. It must be impossible for an attacker to gain code execution before the MMU is turned on (for afore mentioned reasons). Since a CPU starts out with the MMU turned off when it wakes from sleep, this means IORVBAR must point to memory within the RoRgn, and any code executed before the MMU is turned on must not have its control flow controlled by data outside the RoRgn.

In practice

So let’s examine how iOS implements all of the above:

IORVBAR is the first primitive to be fully set up, done so by iBoot before it jumps to the kernel’s entry point. Loaded and locked down all in one go.

RoRgn gets its start and end values set by iBoot as well, but is not locked down yet. This is necessary because a lot of the data that is later readonly actually needs one-time initialisation, which is done by the kernel itself. I’ll skip over most kernel bootstrapping here, but after setting up virtual memory and all const data including kexts, we arrive at kernel_bootstrap_thread , part of which reads: machine_lockdown_preflight (); /* * Finalize protections on statically mapped pages now that comm page mapping is established. */ arm_vm_prot_finalize ( PE_state . bootArgs ); kernel_bootstrap_log ( "sfi_init" ); sfi_init (); /* * Initialize the globals used for permuting kernel * addresses that may be exported to userland as tokens * using VM_KERNEL_ADDRPERM()/VM_KERNEL_ADDRPERM_EXTERNAL(). * Force the random number to be odd to avoid mapping a non-zero * word-aligned address to zero via addition. * Note: at this stage we can use the cryptographically secure PRNG * rather than early_random(). */ read_random ( & vm_kernel_addrperm , sizeof ( vm_kernel_addrperm )); vm_kernel_addrperm |= 1 ; read_random ( & buf_kernel_addrperm , sizeof ( buf_kernel_addrperm )); buf_kernel_addrperm |= 1 ; read_random ( & vm_kernel_addrperm_ext , sizeof ( vm_kernel_addrperm_ext )); vm_kernel_addrperm_ext |= 1 ; read_random ( & vm_kernel_addrhash_salt , sizeof ( vm_kernel_addrhash_salt )); read_random ( & vm_kernel_addrhash_salt_ext , sizeof ( vm_kernel_addrhash_salt_ext )); vm_set_restrictions (); /* * Start the user bootstrap. */ bsd_init (); The first thing we’re interested in is machine_lockdown_preflight , which is just a wrapper around rorgn_stash_range that grabs the values computed by iBoot, translates them to physical addresses and stahes them in RoRgn memory for later: void rorgn_stash_range ( void ) { /* Get the AMC values, and stash them into rorgn_begin, rorgn_end. */ uint64_t soc_base = 0 ; DTEntry entryP = NULL ; uintptr_t * reg_prop = NULL ; uint32_t prop_size = 0 ; int rc ; soc_base = pe_arm_get_soc_base_phys (); rc = DTFindEntry ( "name" , "mcc" , & entryP ); assert ( rc == kSuccess ); rc = DTGetProperty ( entryP , "reg" , ( void ** ) & reg_prop , & prop_size ); assert ( rc == kSuccess ); amcc_base = ml_io_map ( soc_base + * reg_prop , * ( reg_prop + 1 )); assert ( rRORGNENDADDR > rRORGNBASEADDR ); rorgn_begin = ( rRORGNBASEADDR << ARM_PGSHIFT ) + gPhysBase ; rorgn_end = ( rRORGNENDADDR << ARM_PGSHIFT ) + gPhysBase ; } Next, arm_vm_prot_finalize patches page tables for the main kernel binary one last time before they become readonly, removing the writeable flag from all const and code regions.

And then, right before bsd_init there is a call to machine_lockdown , which is a wrapper around rorgn_lockdown . This call seems to have been #ifdef ‘ed out of public sources, but if we compare a few functions to some disassembly, it’s evident a call to machine_lockdown was inlined there: void rorgn_lockdown ( void ) { vm_offset_t ktrr_begin , ktrr_end ; unsigned long plt_segsz , last_segsz ; assert_unlocked (); /* [x] - Use final method of determining all kernel text range or expect crashes */ ktrr_begin = ( uint64_t ) getsegdatafromheader ( & _mh_execute_header , "__PRELINK_TEXT" , & plt_segsz ); ktrr_begin = kvtophys ( ktrr_begin ); /* __LAST is not part of the MMU KTRR region (it is however part of the AMCC KTRR region) */ ktrr_end = ( uint64_t ) getsegdatafromheader ( & _mh_execute_header , "__LAST" , & last_segsz ); ktrr_end = ( kvtophys ( ktrr_end ) - 1 ) & ~ PAGE_MASK ; /* [x] - ensure all in flight writes are flushed to AMCC before enabling RO Region Lock */ assert_amcc_cache_disabled (); CleanPoC_DcacheRegion_Force ( phystokv ( ktrr_begin ), ( unsigned )(( ktrr_end + last_segsz ) - ktrr_begin + PAGE_MASK )); lock_amcc (); lock_mmu ( ktrr_begin , ktrr_end ); /* now we can run lockdown handler */ ml_lockdown_run_handler (); } static void lock_amcc () { rRORGNLOCK = 1 ; __builtin_arm_isb ( ISB_SY ); } static void lock_mmu ( uint64_t begin , uint64_t end ) { __builtin_arm_wsr64 ( ARM64_REG_KTRR_LOWER_EL1 , begin ); __builtin_arm_wsr64 ( ARM64_REG_KTRR_UPPER_EL1 , end ); __builtin_arm_wsr64 ( ARM64_REG_KTRR_LOCK_EL1 , 1ULL ); /* flush TLB */ __builtin_arm_isb ( ISB_SY ); flush_mmu_tlb (); } 0xfffffff0071322f4 8802134b sub w8, w20, w19 0xfffffff0071322f8 0801150b add w8, w8, w21 0xfffffff0071322fc e9370032 orr w9, wzr, 0x3fff // PAGE_MASK 0xfffffff007132300 0101090b add w1, w8, w9 0xfffffff007132304 7c8dfe97 bl sym.func.fffffff0070d58f4 // CleanPoC_DcacheRegion_Force 0xfffffff007132308 e8f641f9 ldr x8, [x23, 0x3e8] 0xfffffff00713230c 1aed07b9 str w26, [x8, 0x7ec] // rRORGNLOCK = 1; 0xfffffff007132310 df3f03d5 isb 0xfffffff007132314 73f21cd5 msr s3_4_c15_c2_3, x19 // ARM64_REG_KTRR_LOWER_EL1 0xfffffff007132318 95f21cd5 msr s3_4_c15_c2_4, x21 // ARM64_REG_KTRR_UPPER_EL1 0xfffffff00713231c 5af21cd5 msr s3_4_c15_c2_2, x26 // ARM64_REG_KTRR_LOCK_EL1 0xfffffff007132320 df3f03d5 isb So within 5 instructions the kernel locks down the values iBoot preprogrammed for RoRgn, and initialised and locks down the KTRR registers as well.

Then it goes on to bootstrap the BSD subsystem, eventually leading to the creation of userland and the launchd process. This means that any exploit based on an app, WebKit, or even an untether binary will be much too late to do anything about KTRR. You’d need either a bootchain exploit, or one that runs very early during kernel bootstrap - which sounds rather infeasible in the presence of KASLR. There isn’t much to discuss here, XNU simply had its segments rearranged in iOS 10 to fit this memory layout: Mem: 0xfffffff0057fc000-0xfffffff005f5c000 File: 0x06e0000-0x0e40000 r--/r-- __PRELINK_TEXT Mem: 0xfffffff005f5c000-0xfffffff006dd0000 File: 0x0e40000-0x1cb4000 r-x/r-x __PLK_TEXT_EXEC Mem: 0xfffffff006dd0000-0xfffffff007004000 File: 0x1cb4000-0x1ee8000 r--/r-- __PLK_DATA_CONST Mem: 0xfffffff007004000-0xfffffff007078000 File: 0x0000000-0x0074000 r-x/r-x __TEXT Mem: 0xfffffff007078000-0xfffffff0070d4000 File: 0x0074000-0x00d0000 rw-/rw- __DATA_CONST Mem: 0xfffffff0070d4000-0xfffffff00762c000 File: 0x00d0000-0x0628000 r-x/r-x __TEXT_EXEC Mem: 0xfffffff00762c000-0xfffffff007630000 File: 0x0628000-0x062c000 rw-/rw- __LAST Mem: 0xfffffff007630000-0xfffffff007634000 File: 0x062c000-0x0630000 rw-/rw- __KLD Mem: 0xfffffff007634000-0xfffffff0076dc000 File: 0x0630000-0x0664000 rw-/rw- __DATA Mem: 0xfffffff0076dc000-0xfffffff0076f4000 File: 0x0664000-0x067c000 rw-/rw- __BOOTDATA Mem: 0xfffffff0076f4000-0xfffffff007756dc0 File: 0x067c000-0x06dedc0 r--/r-- __LINKEDIT Mem: 0xfffffff007758000-0xfffffff0078c8000 File: 0x1ee8000-0x2058000 rw-/rw- __PRELINK_DATA Mem: 0xfffffff0078c8000-0xfffffff007b04000 File: 0x2058000-0x2294000 rw-/rw- __PRELINK_INFO RoRgn protects from __PRELINK_TEXT to __LAST , the executable range spans from __PRELINK_TEXT to __TEXT_EXEC . Yep, page tables used for the main kernel binary are in __DATA_CONST and appropriately named “ropagetable”: /* reserve space for read only page tables */ .align 14 LEXT(ropagetable_begin) .space 16*16*1024,0 In order for ttbr1_el1 to be unchangeable, there must exist no instruction msr ttbr1_el1, xN in executable memory that an attacker could ROP into. It needs to exist somewhere though because it is required for CPU reinitialisation after waking from sleep. But this isn’t a problem, since at that time the MMU is still disabled and all memory is executable. So Apple created a new segment/section __LAST.__pinst (presumably “protected instructions”) and moved there all instructions they consider critical, such as e.g. msr ttbr1_el1, x0 . Since the __LAST segment is in the RoRgn but not in the executable range, it is only executable when the MMU is off. The executable range is a strict subrange of the RoRgn, so… check. Same story as for ttbr1_el1 , there exists one instance of msr sctlr_el1, x0 and that is in __LAST.__pinst . IORVBAR points to LowResetVectorBase , which is in __TEXT_EXEC and thus part of the RoRgn, so all CPUs start out in readonly memory after waking from sleep. The kernel isn’t on the safe side yet as control flow could in theory still be redirected before the MMU is enabled (in common_start by means of MSR_SCTLR_EL1_X0 ), but in practice there seems to exist nothing that lets you redirect control flow. And even if you managed that somehow, you would be able to change ttbr1_el1 and “remap” const data and whatnot, but you’d still need to turn on the MMU on eventually, and in doing so you would again lose the ability to change either ttbr1_el1 and sctlr_el1 , as well as execute any injected code. This is because the absolute first thing the CPU does after waking from sleep is locking down the KTRR registers again: .text .align 12 .globl EXT(LowResetVectorBase) LEXT(LowResetVectorBase) // Preserve x0 for start_first_cpu, if called // Unlock the core for debugging msr OSLAR_EL1, xzr /* * Set KTRR registers immediately after wake/resume * * During power on reset, XNU stashed the kernel text region range values * into __DATA,__const which should be protected by AMCC RoRgn at this point. * Read this data and program/lock KTRR registers accordingly. * If either values are zero, we're debugging kernel so skip programming KTRR. */ // load stashed rorgn_begin adrp x17, EXT(rorgn_begin)@page add x17, x17, EXT(rorgn_begin)@pageoff ldr x17, [x17] // if rorgn_begin is zero, we're debugging. skip enabling ktrr cbz x17, 1f // load stashed rorgn_end adrp x19, EXT(rorgn_end)@page add x19, x19, EXT(rorgn_end)@pageoff ldr x19, [x19] cbz x19, 1f // program and lock down KTRR // subtract one page from rorgn_end to make pinst insns NX msr ARM64_REG_KTRR_LOWER_EL1, x17 sub x19, x19, #(1 << (ARM_PTE_SHIFT-12)), lsl #12 msr ARM64_REG_KTRR_UPPER_EL1, x19 mov x17, #1 msr ARM64_REG_KTRR_LOCK_EL1, x17 There’s not even as much as a conditional branch here, nor any access to memory outside the RoRgn.

Meltdown/Spectre mitigations (>=11.2)

For the uninitiated, Meltdown is one variant of Spectre, which is the name of a whole class of vulnerabilities found in virtually all modern processors. These vulnerabilities allow attackers to leak any data they like from any software running on that CPU if it doesn’t take special countermeasures against that.

For the iOS kernel, this means that it basically has to unmap its entire address space before dropping to EL0, and restoring that mapping once it returns to EL1. Given the care taken to make kernel page tables readonly and removing the ability to change ttbr1_el1 , it seemed like mitigating Spectre would not be possible without breaking KTRR. But with a remarkably clever move, Apple did find a way. We’re gonna need a bit of technical background for this though:

In ARMv8, translating virtual addresses to physical at EL0 and EL1 works as follows:

ttbr0_el1 provides the page table hierarchy for addresses from 0x0 on upwards to some point.

provides the page table hierarchy for addresses from on upwards to some point. ttbr1_el1 provides the page table hierarchy for addresses from 0xffffffffffffffff on downwards to some point.

provides the page table hierarchy for addresses from on downwards to some point. All addresses in between are invalid/unmapped.

Where exactly these two “certain points” are is configured via the tcr_el1 register, specifically its T0SZ and T1SZ fields (ARMv8 Reference Manual, p. D10-2685 onwards). Specifically, the size of each range is 2^(64-T?SZ) bytes (i.e. the larger T?SZ , the smaller the range). Since we’re dealing in powers of two, adding or subtracting 1 to/from that field doubles or halves the size of the address range. So what Apple have done is rather simple:

They split the kernel’s address space into two ranges. The first conatining only the bare minimum to switch between EL0 and EL1, the second containing the entire rest of the kernel.

At boot, T1SZ is set to 25 , thus mapping the first range at 0xffffff8000000000 and the second one at 0xffffffc000000000 (for comparison, the unslid kernel base is 0xfffffff007004000 ).

is set to , thus mapping the first range at and the second one at (for comparison, the unslid kernel base is ). When transitioning to EL0, T1SZ is increased to 26 , thus putting the first range at 0xffffffc000000000 and not mapping the second one at all anymore, and when coming back the value is restored to 25 .

is increased to , thus putting the first range at and not mapping the second one at all anymore, and when coming back the value is restored to . The exception vector is mapped into both ranges, to that vbar_el1 is valid under either.

Fun fact: tcr_el1 used to exist only in __LAST.__pinst , but has subsequently been brought back to normal executable memory since apparently it isn’t that critical after all.

Possible attacks

Going through our list of 0-7 again, let’s reason about what could be done on what level:

(Hardware guarantees)

It might be possible to mount a rowhammer attack against protected memory and exploit the resulting bitflips. Similarly, given kernel memory access it might be possible to induce errors in all of CPU registers, DRAM and MMIO by changing their operating voltage if accessible, or by power glitching. (Please make sure to let me know about any results if you try that!) (Disable protections)

Anyone gaining code execution in iBoot or earlier can trivially just not enable KTRR, and iBoot vulnerabilities probably do exist. (Critical data in RoRgn)

The only public KTRR bypass so far, by Luca Todesco, was based on precisely this. XNU has a BootArgs struct which, among other things, has fields that hold the physical and virtual base addresses of the end kernel, which are used when transitioning from physical to virtual memory (i.e. turning on the MMU). Prior to iOS 10.2, this struct was not in readonly memory, so it was possibly to hijak control flow. This was coupled with the fact that the code that ran on reset did not account for __LAST.__pinst and accidentally included it in the executable range. (Page tables for RoRgn inside RoRgn)

To the best of my knowledge this holds true, so there’s no attack against this. ( ttbr1_el1 unchangeable)

The instruction msr ttbr1_el1, x0 has been uniqued and exists only in __LAST.__pinst anymore, so I don’t see how you would attack that either. (Shellcode injection)

Executable range is a subrange of RoRgn, so… that’s a nope from me. (Turning off the MMU)

Same story as with ttbr1_el1 . (Gaining code exec on reset)

Similarly to point 1, anyone gaining code execution after the CPU wakes from sleep but before the KTRR registers are locked down is off the hook. However, given the fact that this is pretty much the first thing any core does, the attack surface for this seems nonexistant.

Gaining code execution before the MMU is turned on would still be enough for most, but doesn’t seem terribly likely either.

If none of these work, you’ll simply have to make do with memory that needs to be writeable at runtime and which Apple cannot protect. ¯\_(ツ)_/¯