Debugging ARM without a Debugger 3: Printing Stack Trace

This is the last post in the series Debugging ARM without a Debugger.

This is an excerpt from my debugging techniques document for Real-time Programming. These techniques are written in the context of writing a QNX-like real-time microkernel and a model train controller on an ARMv4 (ARM920T, Technologic TS-7200). The source code is located here. Mby teammate (Pavel Bakhilau) and I are the authors of the code.

A stack trace is the ultimate tool that can help you tell exactly where a problem is occurring when used in conjunction with asserts (e.g. in my code, an assert failure triggers the stack trace dump. I also wired the ESC key to an assert failure).

It is particularly useful when you have a complex applications with deep call stacks. For example, if an assert has failed in a utility function such as stack_push in a complex application, it is practically impossible to figure out what happened where without putting print statements everywhere.

With a stack trace, we can re-construct the run-time call hierarchy and find out what is happening. At the end of this article, I will present an example of sophisticated stack trace output that can help us diagnose complex concurrency issues.

Stack Frame Structure

We can deduce the exact stack frame structure from the assembly code generated by the compiler (GCC-arm in my case). Here is an example of a typical function header:

func: mov ip, sp stmfd sp!, {(other optional registers), sl, fp, ip, lr, pc} sub fp, ip, #4 ; function body continues...

The compiler will save the registers pc, lr, ip, fp, sl into the stack in that order. Additionally, the compiler may save any other scratch register used in the function. Important registers for printing a stack trace are pc, lr and fp.

Note that if any compiler optimization is turned on (e.g. -O flag), you need to pass the extra argument -fno-omit-frame-pointer. Otherwise, GCC will optimize out the code that saves the frame pointer.

pc (program counter)

lr (link register)

fp (frame pointer)

Stack Trace Crawler

// a poorly written macro for reading an int at the specified memory address x. #define VMEM(x) (*(unsigned int volatile * volatile)(x)) lr = 0; depth = 0; do { pc = VMEM(fp) - 16; // print here: the calling code is at lr, the current function addr is pc if (lr is invalid) break; lr = VMEM(fp - 4); fp = VMEM(fp - 12); if (fp is not a valid memory or depth too big) break; depth++; } while (the current function is not a top-level function && depth is < some threshold);

Reading the saved pc gives us the address of the entry point of the function plus 16 bytes. This is because pc is always 2 instructions ahead in ARM when we save it.The lr register is the address to return when the current function returns. An instruction before lr would give us the exact code address of the caller of the current function.This is the frame pointer of the previous function. We will need to read this in order to “crawl up” the call graph.Here is the pseudocode ( or the actual code ) for printing the stack trace:

Here's an example code for reading the frame pointer which is required to start printing the stack trace:

#define STRINGIFY(x) #x #define TOSTRING(x) STRINGIFY(x) // reads the register reg to the variable var #define READ_REGISTER(var,reg) __asm volatile("mov %[result], " TOSTRING(reg) "

\t" : [result] "=r" (var)) int fp; READ_REGISTER(fp, fp);

The most important thing here is that you want this code not to fail. Here are common things that can happen that you don't want:

Abort inside another abort (or, an abort inception; install a good abort handler to find out why)

Invalid pointer dereference (e.g. outside the physical memory, or outside .text region)

Stack overflow which will lead to another abort (by getting stuck in an infinite loop of crazy corrupt frame pointers)

Finding out the corresponding C code

objdump -SD executable | less

-ggdb

Printing the function name

Use the commandto figure out what the C code is at a given address. Passing the compiler flagenables objdump to print the C source code next to the disaseembled code. It may not always work with higher optimization level.The debugging process can be much faster if you can see the function names in a stack trace right away when the program crashed, instead of running objdump every time manually.

The proper way to do it is to read the debugging information from the .debug section of the memory. I did not have time to do that, so instead I built my own symbol table array using a shell script hooked up to the makefile.

This symbol table does not need to be sophisticated. A simple array of a function address and its name is good enough. This is because the performance is not a concern when you are printing the stack trace of a crashed system. Another reason is that we want this code to work all the time. It is pretty hard to mess up a linear search.

The symbol array is built off the exported symbols. The method I have used is simple. After compiling all the source code into assembly files, I run a shell script to search for the string “.global” in all the assembly files to generate the exported symbol table. Then I compile the generated code of exported symbols as well, and then link it all together at the end. The following is a sample code how to do it:

funcmap.h (funcmap provides the interface to find function names given an address)

/* call this before calling find_function_name */ void __init_funclist(); funcinfo *__getfunclist(); /* call this function to find the name of the function */ static inline char* find_function_name(uint pc) { funcinfo* fl = __getfunclist(); int i = 0; while (fl[i].fn != 0) { if (fl[i].fn == pc) return fl[i].name; i++; } return "[unknown function]"; }

typedef struct _tag_funcinfo { unsigned int fn; char *name; } funcinfo;

funcmap.c (generated by a shell script)

#include <./task.h> // include ALL the header files static funcinfo __funclist[1]; // the length of this array is also generated void __init_funclist() { int i = 0; __funclist[i].fn=(unsigned int)some_func; __funclist[i++].name="some_func"; // .. more __funclist[i].fn=0; // null terminated } funcinfo* __getfunclist() { return __funclist; }

Lastly, this is how I read all the function names from assembly files in the shell script (the actual script):

FUNCTION_COUNT=`find . -name '*.S' -o -name '*.s' | xargs grep .global | awk '{print $3}' | grep -v '^$' | grep -v '^__' | sort | uniq | wc -l` FUNCTIONS=`find . -name '*.S' -o -name '*.s' | xargs grep .global | awk '{print $3}' | egrep -v '(^$|^__|PLT|GOT|,)' | sort | uniq`

Putting it all together (Example)

Combining the stack trace with task information can be even more powerful than what basic C debuggers offer.

The following is an example of a stack trace output for multiple tasks. It prints two lines per task.

Task 0 {noname} (p:31, pc:0x2180b8, sp0x1edfe34, lr:0x2506d8, WAITING4SEND): nameserver @ 0x2505e8+0, Task 1 {noname} (p:0, pc:0x24c55c, sp0x1eafff0, lr:0x21809c, READY): kernel_idleserver @ 0x24c550+0, Task 3 TIMESERVER (p:31, pc:0x2180b8, sp0x1e4ff80, lr:0x21d1cc, WAITING4SEND): timeserver @ 0x21d074+0, Task 4 {noname} (p:31, pc:0x2180d0, sp0x1e1ffe0, lr:0x21f818, WAITING4EVENT): eventnotifier @ 0x21f7c4+0, Task 5 IOSERVER_COM1 (p:31, pc:0x2180b8, sp0x1deff04, lr:0x21eab8, WAITING4SEND): ioserver @ 0x21e82c+0, Task 6 {noname} (p:30, pc:0x2180d0, sp0x1dbffe0, lr:0x21f818, WAITING4EVENT): eventnotifier @ 0x21f7c4+0, Task 7 IOSERVER_COM2 (p:31, pc:0x2180e0, sp0x1d8fe6c, lr:0x21e104, RUNNING): [unknown function] @ 0x21df94+0, ioserver @ 0x21e82c+253, Task 8 {noname} (p:30, pc:0x2180b0, sp0x1d5ffe0, lr:0x21f830, WAITING4REPLY, last_receiver: 7): eventnotifier @ 0x21f7c4+0, Task 9 {noname} (p:2, pc:0x2180b0, sp0x1d2f878, lr:0x22006c, WAITING4RECEIVE): uiserver_move @ 0x220018+0, timedisplay_update @ 0x23bed4+49, dumbbus_dispatch @ 0x21a5a8+15, a0 @ 0x234c88+646,

Task status code:

WAITING4SEND means the task is waiting for another task to send a message.

means the task is waiting for another task to send a message. WAITING4RECEIVE means the task has sent a message but the receiver has not received the message yet.

means the task has sent a message but the receiver has not received the message yet. WAITING4REPLY means the task has sent a message and someone received it but has not replied yet.

means the task has sent a message and someone received it but has not replied yet. last_receiver tells us the last task that received the message from this task.

tells us the last task that received the message from this task. WAITING4EVENT means the task is waiting for a kernel event (e.g. IO).

means the task is waiting for a kernel event (e.g. IO). READY means the task is ready to run next as soon as this task becomes the top priority task.

means the task is ready to run next as soon as this task becomes the top priority task. RUNNING means the task is currently running.

The first line displays the task number, name, priority, registers, task status and the task synchronization information. The second line displays the stack trace with the offsets from the address of the function.

Why is this powerful? We can use this to solve really complex synchronization issues with wait chains & priorities that is otherwise nearly impossible to without this information. At the end, we had more than 40 tasks interacting with each other and my life would have been much harder without this information.

Limitations

The major limitation of this method is that it can't print the names of static functions. This is because the symbols for static functions are not exported globally. This is not a huge problem because you can still see the names from the output of objdump.