Contents:

Subscribe to my mailing list, support me on Patreon or by PayPal donation.

In the first part of my series about Intel Processor Traces (PT) I showed the underlying mechanics of this HW feature and talked a bit about its main use cases.

Here is the list of all parts in the series of posts about Intel PT:

In this article I will go into one of areas where PT can provide additional value, which is debugging.

Postmortem debugging

Traditionally, in embedded world, issues that happen in production environment are being debugged by logging. But sometimes it’s not enough. Suppose we have a function like this with a big switch inside:

int foo ( unsigned int arg ) { int x = someComplexFunc ( arg ); log ( "Enter calculations" ); switch ( arg ) { case 7498536 : { int y = 7498536 / x ; // potential div by zero x = y - 42 ; break ; } // <lots of other cases> default: x = 42 ; break ; } log ( "result: x = %d

" , x ); return x ; }

Customer reports that once in a while they see some of their machines crash. In the logs they provided the last lines are:

Time message 19:25:13:0124: bar: calling foo 19:25:13:0134: foo: Enter calculations

Well, that’s better than nothing, still doesn’t give any clue what exactly the problem is. If the switch is big enough it will be hard to find where exactly is the issue.

Intel PT can provide a little bit more insights. I used simple-pt to collect the traces . Please refer to simple-pt documentation for how to build and use it.

Source code for this example is available on my github. I built the program like this:

$ gcc a.cpp -g -o app

The command below collects the traces. By default simple-pt saves the traces into 2MB circular buffer. This means new traces overwrite the old ones. So, even for long running applications we can have a trace of what was happening just before the crash:

$ sudo sptcmd -K --cyc 1 taskset -c 0 ./app

After we collected the traces we can decode them with:

$ sudo sptdecode -s ptout.sideband --pt ptout.0 -i -t | xed -F insn: -A -64 > dump.txt

If we now look into the dump.txt , right at the bottom we will see something like:

<...> 560a8ef2870d 0 call callq 0x560a8ef2866allq 560a8ef2866b 0 other mov %rsp, %rbp 560a8ef2866e 0 other movl %edi, -0x14(%rbp) 560a8ef28671 0 other movl $0x0, -0x8(%rbp) 560a8ef28678 0 other movl -0x14(%rbp), %eax 560a8ef2867b 0 other cmp $0x1d4a, %eax 560a8ef28680 0 cjump jnz 0x560a8ef28699 // arg == 7498536 560a8ef28682 0 other mov $0x1d4a, %eax // entering the block 560a8ef28687 0 other cdq // with div by zero

I was having some issues with emitting source code lines in the dumps. In sptdecode there is -d option that is supposed to print source code intermixed with the associated assembly code. After discussing the issue with Andi Kleen it looks like the problem is somewhere on my side.

In this example you can see how Intel PT can tell us the last instructions that were executed. Intel PT can be used as an almost free addition to the logging capabilities of your applications and often can provide a big chunk of useful information for postmortem debugging.

Logs still are very useful because you can print some values in them. Until PTWRITE instruction came out there was no way of dumping data in processor traces. Traces were only useful for determining control flow. But in recent CPUs we have PTWRITE instruction that allows writing values into the PT packets. According to Intel SD Manual:

This instruction reads data in the source operand and sends it to the Intel Processor Trace hardware to be encoded in a PTW packet.

I haven’t used PTWRITE in practice, but I assume that every time you want to print something new you need to recompile the binary. Though you need to do the same when using logs.

My Intel Core i5-8259U doesn’t have PTWRITE support. You can check whether your CPU has PTWRITE support with ptfeature tool which is a part of simple-pt .

Debugging stack corruption issues

Now let me show another case where Intel PT can be useful.

Let me jump right into the example with the program where the call stack is being corrupted:

// a.c void bar (); // implemented in assembly below void foo () { bar (); } int main () { foo (); }

// b.asm GLOBAL bar bar: pop rdx ; remove return address from the stack xor rax, rax ret ud2

Source code for this example is available on my github. Let’s build the program and make sure it’s crashing:

$ gcc a.c -c -g $ nasm -f elf64 b.asm -g $ gcc a.o b.o $ ./a.out Segmentation fault ( core dumped )

When I run usual (pre-installed) version of gdb:

$ /usr/bin/gdb ./a.out GNU gdb ( Ubuntu 8.1-0ubuntu3 ) 8.1.0.20180409-git ( gdb ) r Starting program: /path/to/a.out Program received signal SIGSEGV, Segmentation fault. 0x00007fffffffe336 in ?? () ( gdb ) bt #0 0x00007fffffffe336 in ?? () #1 0x00007fffffffe320 in ?? () #2 0x00007fffffffe320 in ?? () #3 0x00007fffffffe320 in ?? () #4 0x0000555555554619 in main () at a.c:10 Backtrace stopped: frame did not save the PC

Stack is corrupted, so gdb is not able to unwind it. According to my experiments, rr tool cannot provide any additional value either.

Now let’s try to run the same example using build-from-sources gdb and record traces. You can find particular instructions in the appendix of this article.

$ /usr/local/bin/gdb ./a.out GNU gdb ( GDB ) 8.3.50.20190822-git ( gdb ) start Starting program: /path/to/a.out Temporary breakpoint 1, main () at a.c:10 10 foo () ; ( gdb ) record btrace pt ( gdb ) c Continuing. Program received signal SIGSEGV, Segmentation fault. 0x00007fffffffe336 in ?? ()

We have the crash, let’s now see the call history:

( gdb ) record function -call-history 1 main 2 foo 3 bar 4 ??

We can even pull the previous executed instructions.

( gdb ) record instruction-history /m - 4 0x00005555555545fb <foo+1>: mov %rsp,%rbp a.c:5 bar () ; 5 0x00005555555545fe <foo+4>: mov $0x0 ,%eax 6 0x0000555555554603 <foo+9>: callq 0x555555554620 <bar> 7 0x0000555555554620 <bar+0>: pop %rdx 8 0x0000555555554621 <bar+1>: xor %rax,%rax 9 0x0000555555554624 <bar+4>: retq 10 0x00007fffffffe330: xor %al,0x55 ( %rsi ) 11 0x00007fffffffe333: push %rbp 12 0x00007fffffffe334: push %rbp 13 0x00007fffffffe335: push %rbp

Here /m switch is used for intermixing source code with assembly instructions. Because bar is implemented in assembly there is obviously no source line for it. But for foo function we see the corresponding source line ( a.c:5 ).

After we know the exact place where the issue happened we can put normal breakpoint and restart debugging session as usual.

As you see, Intel PT helps when debugging programs with corrupted stack.

Appendix: How to build gdb with Intel PT support

This page would probably be a good starting point. GDB uses libipt for collecting PT, so we need to build it first:

git clone https://github.com/intel/libipt.git mkdir build && cd build cmake ../libipt make make install

Then we build gdb from sources as shown here: