Intel Processor Trace for System Management Mode (ring-2) code tracing

Disclaimer

This work is the part of Summer Of Hack event organized by Digital Security company. The project was led by @ttbr0.

Most of the tools used are made by other people (some of these people are @d_olex, @aionescu).

This result by itself is just combination of existing tools with an aim of getting execution trace from inside SMM for one exact motherboard.

However, this article could be interesting for the people who is trying to recreate this result for his own platform or just want to get basic knowledge about how SMM is working.

JmpTable

.text

This note is about testing possibility of using Intel PT for trace recording inside SMM (ring-2).

By looking into Intel Developer Manual [8] we will know that it’s not possible to initialize SMM tracing outside SMM. If enabled Intel PT will get disabled by processor before starting execution of #SMI Handler.

The only way to trace inside SMM is to enable it manually somewhere inside #SMI initialization code.

This could be done by hooking #SMI Entry with special payload, which would check if trace was enabled before entering SMM and reenable it again. After that we need to call original Entry Handler and later turn Intel PT off, letting processor again reenable it while executing RSM instruction.

First of all, we need to somehow get R/W access to SMRAM. Since it is a protected region of RAM we could not access it from OS directly. There can be few options available:

to exploit existing SMM vulnerability and get R/W primitive from it. This could be either software bug (flaw in SMM code) or hardware abuse (unlocking/remapping SMRAM) to patch UEFI image, so it will make the interface to access SMRAM externally available. This option assumes that we have either BootGuard disabled platform or we could somehow bypass it.

Even though some vulnerabilities were found in SMM already ([1], [12]), it would be better to not rely on them. We want to be able to trace code on the newest firmware and to find some vulnerabilities ourselves.

So, we choose the second option. As we had BootGuard disabled platform the only thing left is to backdoor SMM code.

Figure 1. [photo of platform]

Luckily for us there already existing SMM infector/backdoor available at [2]. We extracted one random DXE driver using UEFITool [10] from firmware and infected it. Original module inside UEFI image was now replaced by backdoored one and we flashed it into SPI memory (we had it unsoldered from the platform for convenience).

Figure 2. [photo of SPI]

System successfully booted and we had full access to SMM memory from python code (SmmBackdoor.py from [2]). As backdoor interface is based on Chipsec [4], be ready to give it access to kernel (we used RWE driver, but it’s better to use chipsec’s opensource one with Test Signing enabled for x64).

We started with making full SMMRAM dump.

$ SmmBackdoor.py -d

This command will create file SMRAM_dump_cb000000_cb7fffff.bin containing everything that is located inside SMRAM. 0xcb000000 is base address and 0xcb7fffff is last address related to SMRAM.

This dump can be loaded in IDA PRO for analysis or supplied as an argument for a special script smmram_parse.py [3] which can extract some information useful for us.

The main thing we are interested in is #SMI EntryPoints. This is the first code that gets executed on each processor switch to SMM. There will be one for each CPU respectively.

Note: Even if out platform has only 4 cores, smram_parse showed 5 entries. The last one is not an entry point but a template that was used to create the real ones during initialization stage.

Figure 3. [screenshot of smram_parse output]

Let’s look at EntryPoint’s code. As SMM is starting it’s execution in 16bit RealMode (with the first 4gb of RAM mapped directly in it’s virtual space), first thing the EntryPoint is doing is switching to 64-bit LongMode. All RMRAM memory is writable and executable by default as only one segment is present.

As we do not want to write 16bit code and prepare everything ourselves it’s obvious that the best place for a hook is right before calling a SMI dispatcher (the function which determines the exact SMM module being called).

Figure 4. [IDA screenshot showing the place we want to hook]

We will simply replace call destination with address of our hook.

All EntryPoints have the same code, so we will patch them respectively.

Note: The place for a hook. As layout of SMRAM is not completely known, we choose the random zeroed memory inside SMRAM and put our hook code in there. The best way would be probably inserting one more SMM module inside firmware, so hook can be legally placed inside SMRAM by the UEFI itself.

Let’s discuss what we want to do exactly inside our hook. First thing is to determine is Intel PT has been enabled before entering SMM.

As we know from Intel documents each processor has it’s own SMBASE (MSR 0x9e) and SMM Save State area.

Figure 5. [image of SMBASE layout]

Unfortunately, Intel Manual is not telling us where IA32_RTIT_CTL.TraceEn bit is stored during. But we can try to determine it ourselves by simple dumping SMM Save State area once with Intel PT being active and once with disabled.

We used WinIPT tool from [5] to enable it on python interpreter process (pid 1337) with 2^^12 bytes for a trace buffer and evaluated SmmBackdoor.py inside it. (0 argument is flags, it’s not important, since we will reconfigure everything inside SMM anyway)

$ ipttool.exe –start 1337 12 0

By comparing snapshots of SMRAM we was able to identify exact offset of IA32_RTIT_CTL.TraceEn bit inside SMM Save State.

It’s stored in byte at offset SMBASE + 0xFE3C, bit 0. This will be our condition to reenable tracing in SMM. This field is marked reserved in Intel Developer Manual.

Figure 6. [screenshot from Intel Developer Manual about fields being reserved]

We did not want to configure Intel PT ourselves inside SMM since it would make things complicated (allocating continuous pages, reporting address to userspace and etc.). Using already configured tracing is much more convenient (and it already has implemented trace saving functionality).

As we used WinIPT tool which currently doesn’t support of tracing kernel code (CPL=0) it is obvious that even if we enable Intel PT inside SMM, trace will not be saved, as SMM code is executing at CPL=0. We had to modify some filters, so tracer can work during all time executing of SMM code. Let’s list everything we need to check and force.

CPL=0 trace must be enabled CPL>0 trace must be enabled IP ranges must be disabled IA32_RTIT_STATUS.PacketByteCnt must be cleared CR3 filtering must be disabled

Now let’s say few words about PacketByteCnt. This is counter determines when to insert synchronization packets inside trace (PSB packet sequences). We are required to clear it because later while parsing the trace, PSB sequences will help parser to start from the first SMM code and not from some random place at the time PSB was generated naturally.

The shellcode we used:

sub rsp, 0x18 ; this will align stack at 16 byte boundary (in case SMM ; code uses align dependent instructions) mov qword ptr ss:[rsp+0x10], rcx ; need to save rcx for SMI_Dispatcher mov ecx, 0x9E ; MSR_IA32_SMBASE rdmsr test byte ptr ds:[rax+0xFE3C], 0x1 ; Save State area contains saved ; IA32_RTIT_CTL.TraceEn je short @NoTrace call @Trace_Enable mov rcx, qword ptr ss:[rsp+0x10] ; SMI_Dispatcher is __fastcall ; (first argument in rcx) mov eax, 0xCB7DDAA4 ; original SMI_Dispatcher !!!!!!!!!!!!!!!!!!!!! call rax call @Trace_Disable add rsp, 0x18 ret @NoTrace: mov rcx, qword ptr ss:[rsp+0x10] ; SMI_Dispatcher is __fastcall mov eax, 0xCB7DDAA4 ; original SMI_Dispatcher !!!!!!!!!!!!!!!!!!!!! call rax add rsp, 0x18 ret @Trace_Disable: mov ecx, 0x570 ; IA32_RTIT_CTL rdmsr mov rax, qword ptr ss:[rsp+0x10] ; restore IA32_RTIT_STATUS wrmsr mov ecx, 0x571 ; IA32_RTIT_STATUS rdmsr mov rax, qword ptr ss:[rsp+0x8] ; restore IA32_RTIT_CTL wrmsr ret @Trace_Enable: mov ecx, 0x571 ; IA32_RTIT_STATUS rdmsr mov qword ptr ss:[rsp+0x8], rax ; save IA32_RTIT_STATUS and edx, 0xFFFF0000 ; IA32_RTIT_STATUS.PacketByteCnt = 0 wrmsr mov ecx, 0x570 ; IA32_RTIT_CTL rdmsr mov qword ptr ss:[rsp+0x10], rax ; save IA32_RTIT_CTL and eax, 0xFFFFFFBF ; IA32_RTIT_CTL.CR3Filter = 0 or eax, 0x5 ; IA32_RTIT_CTL.OS = 1; IA32_RTIT_CTL.User = 1; and edx, 0xFFFF0000 ; IA32_RTIT_CTL.ADDRx_CFG = 0 wrmsr ret

This will allow us to make the first SMM code trace. WinIPT can save trace to file by using this command:

$ ipttool.exe –trace 1337 trace_file_name

Disabling trace:

$ ipttool.exe –stop 1337

Trace log use cases

We can try to disassemble trace file by using dumppt tool from libipt [6].

$ ptdump.exe –no-pad ./examples/trace_smm_handler_33 > ./examples/trace_smm_handler_33_pt_dump.txt

Sample output:

Figure 7. [screenshot of first traced instructions inside SMM]

We can see some addresses, but it’s hard to use such information as records are extremely low-level.

For better output libipt has ptxed tool which can convert PT log into more readable trace file with assembler instructions. We of course have to provide our SMRAM dump to make it work (PT log doesn’t contain any register or memory values, only addresses of control flow changed by indirect transitions).

Figure 8. [screenshot of assembler output for PT log]

This looks much better, but still, if code contains a loop output will get spammed with the same instructions all over again and again.

Let’s get some coverage visualization. We choose Lighthouse [7] which is IDA PRO plugin that uses drcov format.

There is no tool available for easy converting PT log into drcov format so we modified ptxed to also output coverage file during it execution. Patched ptxed available at [11]. Look at commit history to see what was modified comparing to original [13].

$ ptxed.exe –pt tracesmm_12 –raw SMRAM_dump_cb000000_cb7fffff.bin:0xcb000000 > tracesmm_12_ptasm

After finishing execution SMRAM_dump_cb000000_cb7fffff.bin.log will be created containing coverage data in drcov format.

Note: There is small problem about syncing disassembler with first PSB in SMM. For some reason PSB block prepends PGE entry and ptxed is unable to synchronize IP. We bypassed it by patching libipt (see figure below). I am not sure if it is a bug in ptxed or I am doing something wrong about resetting IA32_RTIT_STATUS.PacketByteCnt field.

Figure 9. [patch which would allow using PSB block located right before PGE]

Coverage files can be loaded in IDA PRO and we will get pretty highlighting and statistics about executed functions.

Note: Lighthouse plugin work strangely if binary is not completely analyzed (code is undefined, functions is not marked). I traced this problem down to get_instructions_slice in \lighthouse\metadata.py as this function returns zero instructions for an address which even has function defined for it. It looks like plugin uses cache and ignores new code defined. This can be bypassed by triggering reanalyze on the program and reloading the database. Only after that plugin is able to see the new code and highlight it. As this problem became annoying in case of SMRAM dump (which is almost completely consist of undefined code) I made a small patch for Lighthouse, so I can quickly define all new code manually.

Figure 10. [log message was added, so we can help Lighthouse and define new code for it]

Figure 11. [screenshot of IDA Pro Lighthouse loaded with coverage data]

Summary

Since this was a small task to get SMM tracer working I am stopping at posting this note. I hope this can help others to make use of Intel PT during vulnerability research of SMM code.

Some trace examples can be found inside ./examples directory alongside with SMRAM dump taken from our platform.

Quick start guide

Let’s assume you already have backdoored platform ready.

SmmTrace.py file contains sample code to interact with backdoor in the way so all handlers will be patched to reenable Intel PT inside SMM. SmmTrace.py will try to make a trace of backdoor execution inside SMM (it will call #SMI with index 0xCC).

If you use the same platform we had – it will work just like that, no modifications. But you most probably won’t…

Unfortunately it’s not possible to make patch universal, so every firmware must have it’s own hooking method. You have to reverse entry point and decide how to hook it.

As our firmware’s #SMI can be hooked by replacement of the call destination, SmmTrace.py has hardcoded addresses of #SMI EntryPoints and offset to this call_destination (0x14e+2 in code).

As we would want to call the original handler, its address is also hardcoded (0xCB7DDAA4). There is no point in locating it dynamically as it remains constant unless we flash another firmware.

And the last hardcoded value is the address of the memory we use for storing our hook’s code (0xcb000000 + 0x4a0000 had all zeroes and system remain stable with it overwritten).

Function test_run_PT will trigger #SMI while trace is being enabled. Trace enabling is done using external tool ipttool.exe which should be placed in script working directory. Function will also try to convert fresh dump into PT pseudo-code and assembler trace. SMRAM_dump_cb000000_cb7fffff.bin should present in the working directory for it to work.

Bonus material

EFI variable for (supposedly) SMM tracing

We noticed that there is EFI variable CpuSmm-90d93e09-4e91-4b3d-8c77-c82ff10e3c81, it has following structure:

typedef struct { UINT8 CpuSmmMsrSaveStateEnable; UINT8 CpuSmmCodeAccessCheckEnable; UINT8 CpuSmmUseDelayIndication; UINT8 CpuSmmUseBlockIndication; UINT8 CpuSmmUseSmmEnableIndication; UINT8 CpuSmmProcTraceEnable; } CPU_SMM;

It doesn’t seem that this variable has been read anywhere in the code of our version of UEFI firmware. Future researchers should probably look for access to this variable and try to enable some of it fields. Maybe this will do the same thing as our patch and allow trace to continue inside SMM. If this is true, then we will not have to acquire R/W access to SMRAM.

This is probably related to [9].

2. Linux support

As all our tests were on latest Windows 10 x64 (we required ipt.sys from Windows October Creators Update 2018 to present in our system) let’s say something about Linux support.

There is perf module inside Linux kernel which can do basically the same thing as WinIPT (ipt.sys) including kernel mode tracing functionality.

As backdoor interface code is based on Chipsec framework which is cross-platform, our patch should work on Linux system without any difference.

References