Rambles around computer science

Diverting trains of thought, wasting precious time

Mon, 30 Jan 2017

Debugging with the natives, part 2

Some aeons ago, I wrote a piece giving a rough outline of how a native debugger like gdb works, and promised a follow-up that looks at the same pile of techniques in a more principled way. I can't really excuse the delay, but if anyone's still listening, here goes.

Source-level debugging of native code is supported on (I make it) four different different levels in a complete hardware–software stack.

Hardware support: the CPU and memory hardware usually provide some primitives that are either designed explicitly for debugging (single-step mode, hardware watchpoints, etc.) or just happen to prove useful (memory protection, illegal instructions, etc.). These are machine-level facilities, and obviously know nothing about source languages.

the CPU and memory hardware usually provide some primitives that are either designed explicitly for debugging (single-step mode, hardware watchpoints, etc.) or just happen to prove useful (memory protection, illegal instructions, etc.). These are machine-level facilities, and obviously know nothing about source languages. Operating system mechanisms: the operating system provides process-level debugging mechanisms, such as the ptrace() call I talked about last time. These also know nothing about source languages, but provide the essentials for interactive debugging of one process by another.

the operating system provides process-level debugging mechanisms, such as the ptrace() call I talked about last time. These also know nothing about source languages, but provide the essentials for interactive debugging of one process by another. Compiler support: the compiler(s) which generated the debugged program output additional information to help the debugger in its task. This is where knowledge of the source language comes in to the system. In this way, most of the awareness of the source language (and its implementation) is localised to the compiler rather than the runtime environment (a.k.a. the operating system) or the tools (debugger).

the compiler(s) which generated the debugged program output additional information to help the debugger in its task. This is where knowledge of the source language comes in to the system. In this way, most of the awareness of the source language (and its implementation) is localised to the compiler rather than the runtime environment (a.k.a. the operating system) or the tools (debugger). The debugger itself is a client of all these features, and also embodies knowledge of one or more source languages (and, unfortunately, sometimes, of specific implementations of them). In fact it contains a partial (often extensive) implementation of these source languages—most often used for printing expressions, but also conditional breakpoints, watchpoints, etc.. Unfortunately, but also usefully, these implementations are hand-rolled—they don't share much code with any compiler. They are interpreters rather than compilers, and interpreters of a very peculiar kind: they “back onto” a running process image containing a third-party implementation of the debugged language. They have no say in how this implementation works, but must cooperate with it. They do not generate any (significant) code of their own, nor take their own decisions about how the language is compiled. It's worth meditating for a second on how odd this is, and also on what this means. The debugger effectively exposes a “view” of the running process state, in some source language. You can even switch languages, say to jump from a “C view” to a “Pascal view” s(or whatever) of the same process state.

We can split these four into two pairs. The first two implement machine-level primitives and their “upward mapping” (virtualisation) into the operating system's process abstraction. The second two implement the source-level view that the programmer usually prefers, again mapping upwards from binary- to source-level.

Let's take the machine-level pair to begin with. From the operating system's point of view, all debugging support is designed around the following principles. (Perhaps I shouldn't say “designed” since in reality they “grew”— and only became principles later.)

Debugging happens outside the debuggee, which remains oblivious. The debuggee includes no run-time support for debugging, and doesn't even know it's happening. Debuggers depend only on the generic services of the operating system, not on any functionality specific to the program.

The debuggee includes no run-time support for debugging, and doesn't even know it's happening. Debuggers depend only on the generic services of the operating system, not on any functionality specific to the program. Debugging doesn't need to be explicitly enabled. Debuggers may attach to any process, without the cooperation of that process. (This wasn't true in the earliest versions of Unix, but was fixed by the time of pi in AT&T Eighth Edition.)

Debuggers may attach to any process, without the cooperation of that process. (This wasn't true in the earliest versions of Unix, but was fixed by the time of in AT&T Eighth Edition.) No language knowledge: the machine- and OS-level debugging infrastructure has no knowledge of any source language syntax or semantics. All knowledge of the source language is left to the compiler that implemented the source language, and (to a lesser extent) to the debugger itself.

the machine- and OS-level debugging infrastructure has no knowledge of any source language syntax or semantics. All knowledge of the source language is left to the compiler that implemented the source language, and (to a lesser extent) to the debugger itself. Provide a symbolic memory abstraction: OS-level infrastructure does abstract a little from the machine level, by providing a “symbolic” view of a process's structure, roughly at the assembler's level of abstraction. Symbol information is specified in the operating system's object file format, and may or may not be present in the executed binary; if it is, and if the debugger can find that binary (as opposed to the raw process image), it will make use of it.

Some surprisingly strong properties result from this design. Firstly, debugging can be done from a remote process, perhaps on a separate machine from the debuggee, perhaps even a machine of a different architecture. Secondly, debugging can be done post-mortem. Thirdly, the same infrastructure works for many source languages—albeit trivially so far, since we've only seen how to get an assembly-level view. There are some contrasts here with most language virtual machines (think JVM): these implement debugging using in-VM debug servers. These can work across the network, but don't support post-mortem debugging, and typically bake in concepts from source languages.

That's enough about the machine-level part. To go from machine- or assembly-level debugging to source-level debugging, we need help from the compiler. This is designed around the following principles.

Obliviousness (again): the debugger knows nothing about the compiler. The compiler need not be present when/where the debugger is run. There is no API—no API between debugger and compiler, and no API between debugger and debuggee. (The only API is between debugger and OS, namely the ptrace() that we've seen already.) Between compiler and debugger there is, however, an interface of another kind....

the debugger knows nothing about the compiler. The compiler need not be present when/where the debugger is run. There is no API—no API between debugger and compiler, and no API between debugger and debuggee. (The only API is between debugger and OS, namely the that we've seen already.) Between compiler and debugger there is, however, an interface of another kind.... The compiler describes its own work. It is the compiler's job to output whatever information that a debugger might need to debug the program, and in a compiler-independent form: metadata, a.k.a. debugging information, packaged within or alongside the output binary. This is what the -g option to a C compiler is turning on. It is distinct from the symbolic (assembly-level) metadata that the object file format prescribes (which is always output, but can optionally be removed with strip ). Although debugging information is sometimes stored in the same sections of the object file as symbols are (as with STABS), nowadays it generally isn't. Calling it “symbols”, therefore, although still common, is a bit outdated and confusing.

It is the compiler's job to output whatever information that a debugger might need to debug the program, and in a compiler-independent form: metadata, a.k.a. debugging information, packaged within or alongside the output binary. This is what the option to a C compiler is turning on. It is distinct from the symbolic (assembly-level) metadata that the object file format prescribes (which is always output, but can optionally be removed with ). Although debugging information is sometimes stored in the same sections of the object file as symbols are (as with STABS), nowadays it generally isn't. Calling it “symbols”, therefore, although still common, is a bit outdated and confusing. This metadata describes mappings between source and binary levels , in both directions. Given a source-level entity, one can look up facts about how it is implemented at the binary level—such as which register currently holds a given local variable. There is also some mapping in the other direction: given a program counter value, say, one can look up which subprogram (function or method) the instructions at that location are from. For modern source languages and modern compilers, this metadata is, by necessity, extremely expressive and extremely detailed. The available mappings don't quite cover every conceivable requirement, and are a bit biased to typical debugger features. For example, DWARF makes it fairly easy to map from a source variable to its machine location (as required to evaluate commands like “ print x ”), but hard to do the inverse mapping from a machine location (as would be required if the debugger let the user ask “what's in register rax right now?”). Fully bidirectional mappings would be pretty useful in many debugging scenarios—this is something I've worked on a little, and plan to work on some more.

, in both directions. Given a source-level entity, one can look up facts about how it is implemented at the binary level—such as which register currently holds a given local variable. There is also some mapping in the other direction: given a program counter value, say, one can look up which subprogram (function or method) the instructions at that location are from. For modern source languages and modern compilers, this metadata is, by necessity, extremely expressive and extremely detailed. The available mappings don't quite cover every conceivable requirement, and are a bit biased to typical debugger features. For example, DWARF makes it fairly easy to map from a source variable to its machine location (as required to evaluate commands like “ ”), but hard to do the inverse mapping from a machine location (as would be required if the debugger let the user ask “what's in register right now?”). Fully bidirectional mappings would be pretty useful in many debugging scenarios—this is something I've worked on a little, and plan to work on some more. No effect on codegen: whether or not the compiler generates debugging information typically has no effect on the code generated. “Debug builds” don't run any slower than other builds. This is true even when a debugger is attached, except for the overhead of the operating system mechanisms themselves, which tend to be low: they lengthen the path through system calls and signal handling, but not the mainline execution of the program.

whether or not the compiler generates debugging information typically has no effect on the code generated. “Debug builds” don't run any slower than other builds. This is true even when a debugger is attached, except for the overhead of the operating system mechanisms themselves, which tend to be low: they lengthen the path through system calls and signal handling, but not the mainline execution of the program. Metadata is “best-effort”: in practice, the compiler does not commit to faithfully describing all implementation decisions it took during compilation. Consequently, the debugger might sometimes be unable to recover a source-level view of the program. For example, it might fail to find a particular local variable, or not know how many instructions it needs to step to reach the next source line. These happen if the metadata generated by the compiler is incomplete or absent.

Let's do a more concrete run-through of how it works. So far I've been fairly generic, but let's fix on GNU/Linux as our modern Unix—though all ELF-based systems are pretty similar—and a familiar architecture (x86-64) and specific metadata format (DWARF).

When I compile a program with -g it means “please generate metadata”. First, let's try without.

$ cc -o hello hello.c $ readelf -S hello | grep debug # no output! no debugging sections

You can still debug this program, at the assembly level, because the OS debugging mechanisms remain available. It's as if the compiler-generated assembly is code that you wrote manually by yourself. You can set breakpoints, watchpoints, single step, and so on.

$ gdb -q --args ./hello Reading symbols from ./hello...(no debugging symbols found)...done. (gdb) break main Breakpoint 1 at 0x40052a (gdb) run Starting program: /tmp/hello Breakpoint 1, 0x000000000040052a in main () (gdb) disas Dump of assembler code for function main: 0x0000000000400526 <+0>: push %rbp 0x0000000000400527 <+1>: mov %rsp,%rbp => 0x000000000040052a <+4>: mov $0x4005c4,%edi 0x000000000040052f <+9>: callq 0x400400 <puts@plt> 0x0000000000400534 <+14>: mov $0x0,%eax 0x0000000000400539 <+19>: pop %rbp 0x000000000040053a <+20>: retq

Now let's compile with debug information.

$ cc -g -o hello hello.c $ readelf -S hello | grep debug [28] .debug_aranges PROGBITS 0000000000000000 00001081 [29] .debug_info PROGBITS 0000000000000000 000010b1 [30] .debug_abbrev PROGBITS 0000000000000000 00001142 [31] .debug_line PROGBITS 0000000000000000 00001186 [32] .debug_frame PROGBITS 0000000000000000 000011c8 [33] .debug_str PROGBITS 0000000000000000 00001210

What's in these debug sections? There are three main kinds of information. Firstly, there are files and line numbers ( .debug_line ). These encode a mapping from object code addresses to source coordinates (file, line, column). You can dump it fairly readably, as follows.

$ readelf -wL hello Decoded dump of debug contents of section .debug_line: CU: hello.c: File name Line number Starting address hello.c 4 0x400526 hello.c 5 0x40052a hello.c 6 0x400534 hello.c 7 0x400539

Secondly, there is frame information (often this comes out in a section called .eh_frame so I cheated a bit above; to get exactly the above with a current gcc , you should add the -fno-dwarf2-cfi-asm switch). This tells the debugger how to walk the stack. What does walking the stack mean? Roughly it means getting a sequence of stack pointers paired with their program counter values, for each call site that is active on the callchain. The old stack pointer and program counter are always saved somewhere, otherwise we couldn't return from the call. To walk the stack we start from the current “live” register file given to use by ptrace() , which holds the “end” stack pointer and program counter. The DWARF then describes how to “rewind” these register values, and/or any other registers whose job is to record the callchain ( rbp on x86-64; other callee-saves are often included too) back to their state at the previous call site in the chain. The description of this unwinding is logically a table, which you can see below for the main function. The cells are expressions describing how to compute the caller's value for a register, here rbp value (frame pointer) and also the caller's program counter, i.e. the return address (given as ra ). The computations are both factored into two. Firstly we calculate a “canonical frame address” from the current frame's register values (see the CFA column): it's a fixed offset from rsp and rbp , and is actually a fixed address on the stack, but the expression changes from instruction to instruction as the stack pointer gets adjusted. Secondly we obtain the saved values we want by reading from the stack at fixed offsets from that address ( c-8 means 8 bytes down). This factoring helps compactness, because the CFA-relative offsets don't change when the stack pointer moves; only the CFA column needs to describe that. However, although “stored at some offset from the CFA” covers a lot of cases, sometimes more complex computations are required, which usually appear as DWARF bytecode expressions.

$ readelf -wF hello (snip) 00000088 000000000000002c 0000001c FDE cie=00000070 pc=0000000000400526..000000000040053b LOC CFA rbp ra 0000000000400526 rsp+8 u c-8 0000000000400527 rsp+16 c-16 c-8 000000000040052a rbp+16 c-16 c-8 000000000040053a rsp+8 c-16 c-8 (snip)

The .debug_info section is the biggie. It describes the structural detail of the source program along both source and binary dimensions. It has a list of source files, but also a list of compilation units. The latter is where most of the structure is. It describes functions/methods, data types, and all the language-implementation decisions that the compiler took when generating binary code: how data types are laid out, which registers or stack slots hold each local variable over its lifetime, and so on. Although not shown much in the simple case shown below, addresses of program variables are described in a Turing-powerful stack machine language which is essentially a bytecode; the DW_OP_call_frame_cfa below is one operation, which simply says “push the address of the frame base, as recorded by the frame info”. The tree-like structure of the information also describes detailed static structure of code, including function inlining, the in-memory locations corresponding to particular lexical blocks in the code, and so on. (It's worth asking whether DWARF info it might usefully bundle the source code itself. I've never seen this done, but it would make a lot of sense to me.)

$ readelf -wi hello Contents of the .debug_info section: Compilation Unit @ offset 0x0: Length: 0x8d (32-bit) Version: 4 Abbrev Offset: 0x0 Pointer Size: 8 <0><b>: Abbrev Number: 1 (DW_TAG_compile_unit) <c> DW_AT_producer : (indirect string, offset: 0x2f): GNU C 4.9.2 -mtune=generic -march=x86-64 -g -fno-dwarf2-cfi-asm -fstack-protector-strong <10> DW_AT_language : 1 (ANSI C) <11> DW_AT_name : (indirect string, offset: 0x88): hello.c <15> DW_AT_comp_dir : (indirect string, offset: 0xb5): /tmp <19> DW_AT_low_pc : 0x400526 <21> DW_AT_high_pc : 0x15 <29> DW_AT_stmt_list : 0x0 <1><2d>: Abbrev Number: 2 (DW_TAG_base_type) <2e> DW_AT_byte_size : 8 <2f> DW_AT_encoding : 7 (unsigned) <30> DW_AT_name : (indirect string, offset: 0x0): long unsigned int <1><34>: Abbrev Number: 2 (DW_TAG_base_type) <35> DW_AT_byte_size : 1 <36> DW_AT_encoding : 8 (unsigned char) <37> DW_AT_name : (indirect string, offset: 0xa2): unsigned char <1><3b>: Abbrev Number: 2 (DW_TAG_base_type) <3c> DW_AT_byte_size : 2 <3d> DW_AT_encoding : 7 (unsigned) <3e> DW_AT_name : (indirect string, offset: 0x12): short unsigned int <1><42>: Abbrev Number: 2 (DW_TAG_base_type) <43> DW_AT_byte_size : 4 <44> DW_AT_encoding : 7 (unsigned) <45> DW_AT_name : (indirect string, offset: 0x5): unsigned int <1><49>: Abbrev Number: 2 (DW_TAG_base_type) <4a> DW_AT_byte_size : 1 <4b> DW_AT_encoding : 6 (signed char) <4c> DW_AT_name : (indirect string, offset: 0xa4): signed char <1><50>: Abbrev Number: 2 (DW_TAG_base_type) <51> DW_AT_byte_size : 2 <52> DW_AT_encoding : 5 (signed) <53> DW_AT_name : (indirect string, offset: 0x25): short int <1><57>: Abbrev Number: 3 (DW_TAG_base_type) <58> DW_AT_byte_size : 4 <59> DW_AT_encoding : 5 (signed) <5a> DW_AT_name : int <1><5e>: Abbrev Number: 2 (DW_TAG_base_type) <5f> DW_AT_byte_size : 8 <60> DW_AT_encoding : 5 (signed) <61> DW_AT_name : (indirect string, offset: 0xb0): long int <1><65>: Abbrev Number: 2 (DW_TAG_base_type) <66> DW_AT_byte_size : 8 <67> DW_AT_encoding : 7 (unsigned) <68> DW_AT_name : (indirect string, offset: 0xb9): sizetype <1><6c>: Abbrev Number: 2 (DW_TAG_base_type) <6d> DW_AT_byte_size : 1 <6e> DW_AT_encoding : 6 (signed char) <6f> DW_AT_name : (indirect string, offset: 0xab): char <1><73>: Abbrev Number: 4 (DW_TAG_subprogram) <74> DW_AT_external : 1 <74> DW_AT_name : (indirect string, offset: 0xc2): main <78> DW_AT_decl_file : 1 <79> DW_AT_decl_line : 3 <7a> DW_AT_prototyped : 1 <7a> DW_AT_type : <0x57> <7e> DW_AT_low_pc : 0x400526 <86> DW_AT_high_pc : 0x15 <8e> DW_AT_frame_base : 1 byte block: 9c (DW_OP_call_frame_cfa) <90> DW_AT_GNU_all_tail_call_sites: 1 <1><90>: Abbrev Number: 0

That's it for the tour. Let me finish with some reflections on what's good and bad about this way of doing things.

Descriptive debugging: the good

Why do debugging this convoluted, long-winded way, with all this metadata, instead of the apparently simpler VM way of doing things? In VMs, debug servers are integrated into the runtime, and offer a fixed, high-level command interface. This allows the VM's compile-time implementation decisions to stay hidden. There is no need to tell the debugger how storage is laid out, where local variables live, and so on, because the command interface is pitched at a higher level of abstraction; those details remain internal to the VM. This is convenient for the VM implementer, since generation of debugging information is onerous to implement. But it also requires cooperation from the debuggee, couples debuggers to a fixed command language or wire protocol, and presents strictly less information to the developer. While VM debuggers are designed around the abstraction boundaries of the source languages, metadata-based debugging actively enables descending through these boundaries. It is sometimes very useful for debugging tools to expose implementation details in this way. The most obvious case is when faced with a compiler or VM bug; the user would like to “shift down” to the lower level to inspect the assembly code or VM state. At other times, there are performance bugs that the developer has a hunch are about cache or paging effects; being able to see the raw addresses and raw memory contents can help here, even when the program is running on a VM.

Being highly descriptive, debugging metadata documents a large number of implementation decisions taken by the compiler, so is useful not only to debuggers but also to profilers, language runtimes (C++ exception handling is usually built on DWARF frame information), other dynamic analysis tools such as Valgrind family, and so on.

Debugging optimised code (without deoptimising)

Debugging metadata must describe optimised code. By contrast, VM debug servers typically arrange that debug-server operations only need to deal with unoptimised stack frames and at most simply-generated code (e.g. from a template-based JIT). Confusingly, even the “full-speed debugging” feature of HotSpot uses dynamic deoptimisation to get back to unoptimised code—the earlier approach was to run the whole program under the interpreter whenever you wanted a debuggable execution. In general, a debuggable VM instance must either refrain from optimisation, or know how to dynamically undo that optimisation when a debugger is attached. So, dynamic deoptimisation is not exactly “full speed”—unlike with native debuggers, execution still slows down significantly when a debugger is attached. Having the VM implement debug operations only over unoptimised code is a restriction that helps make the debug server simple, at some cost in debug-time performance.

The flip side is that VM debugging is pretty good at precisely maintaining the source-level abstraction (modulo VM bugs), without complicating the task of implementing optimisations. Meanwhile, in Unix-land, the debugging experience remains best-effort and only as good as the compiler-generated metadata, which is sometimes wrong or incomplete following complex transformations. When optimisation and debuggability are in tension, debuggability usually takes the hit, so a smooth debugging experience still sometimes relies on a separate unoptimised “debug build”. Tail call optimisations are a classic debugging-hindering optimisation, since they rely on eliding stack frames, meaning the debugger cannot know how many recursive calls are logically on the (source-level) stack. Instruction scheduling is another: the order of operations in the executed code need not match source program order, and this can make for debug-time confusion.

The control problem

Some other properties are incidental but tend to be true of current Unix-style debugging. Debugger support for exercising the target language's control features (exceptions, threads, allocations, ...) is uneven, because this can't be described purely as metadata; there needs to be some interaction with the host runtime. DWARF and similar debugging information is good at the “statics” of describing how to decode the program state, but not good at describing the protocols of interaction with a language runtime, necessar for performing operations such as spawning a thread, allocating an object or throwing an exception. These tend to be difficult unless these happen to be cleanly exposed as entry points in the language runtime. In practice debuggers usually achieve these things by having magic knowledge of particular implementations.

At least one semi-portable interface has emerged with the aim of encapsulating run-time control operations for debuggers' benefit. I'm thinking of libthread_db , best described by Joe Damato's excellent article. Unfortunately it's an abomination, because it violates the principle that implementation details are described by architecture-independent metadata. An odd but cleaner and more consistent alternative would be to bundle snippets of DWARF bytecode for doing these runtime interactions—perhaps in the debug information of a language runtime, either simply calling into the runtime (for cleanly abstracted operations) or doing something more complex. But that is only a technical possibility; there are no proposals or working demos of that as far as I'm aware (maybe I'll make one). This might sound wacky, but if you know about the early history of Java, in Oak and the Green Project, you'll see a certain uncanny similarity in these ideas.

Levels of abstraction

Debugging at multiple levels of abstraction is a neat facility of Unix-style debugging, but is also a difficult power to control. It can be useful to switch down to the assembly-level view, or to switch languages, but this capability doesn't generalise to the case where many abstraction layers are built up within the same language (think C++). The debugger will let you “continue to the next source line”, but it doesn't know how to keep you at the same abstraction level. If the next source line is deep inside a library implementing something fairly basic like a smart pointer, it will skip only as far as that line, whereas you probably wanted to stay roughly at the same level of abstraction, or perhaps within the same codebase. Things get particularly bad when there is a lot of inlining (again with C++). The traditional “step over” and “step into” are hints at this need, but are too crude.

Doing better is currently beyond the debugger's ken, but this problem could be solved: perhaps by bringing in the knowledge of libraries and source file trees that the debugger already has, or perhaps most simply by allowing programmers to manually mark out the boundaries between layers. This could be a simple partitioning over source files and binary objects, or could be something more complex, perhaps sensitive to calling context or argument values (consider the case of the same library used from two places in the same application). “Next”-style operations could then be defined in terms of these layers. I'd love to see this, although the details would take a lot of working out.

To be continued?

There's plenty of research to be done on more debuggable language implementations. This research area doesn't seem to get the attention it deserves. One problem is that debugging is usually an afterthought for researchers, even though it is essential for programmers. Another problem is that not many PL researchers have sight of the design space; they're familiar with either the Unix-style approach or the VM-style one. I hope that in the future we can figure out how to get the best of both worlds.

[/devel] permanent link contact

validate this page