Haskell programmers tend to spend far less time with debuggers than programmers in other languages. Partly this is because for pure code debuggers are of limited value anyway, and Haskell’s type system is so strong that a lot of bugs are caught at compile time rather than at runtime. Moreover, Haskell is a managed language – like Java, say – and errors are turned into exceptions. Unlike in unmanaged languages such as C “true” runtime errors such as segmentation faults almost never happen.

I say “almost” because they can happen: either because of bugs in ghc or the Haskell runtime, or because we are doing low level stuff in our own Haskell code. When they do happen we have to drop down to a system debugger such as lldb or gdb , but debugging Haskell at that level can be difficult because Haskell’s execution model is so different from the execution model of imperative languages. In particular, compiled Haskell code barely makes any use of the system stack or function calls, and uses a continuation passing style instead (see my previous blog posts Understanding the Stack and Understanding the RealWorld). In this blog post I will explain a technique I sometimes use to help diagnose low-level problems.

Since I work on OSX I will be using lldb as my debugger. if you are using gdb you can probably use similar techniques; The LLDB Debugger shows how gdb and lldb commands correlate, and the ghc wiki also lists some tips. However, I have no experience with scripting gdb so your mileage may vary.

Description of the problem

As our running example I will use a bug that I was tracking down in a client project. The details of the project don’t matter so much, except that this project happens to use the GHC API to compile Haskell code—at runtime—into bytecode and then run it; moreover, it also—dynamically, at runtime—loads C object files into memory.

In one example run it loads the (compiled) C code

#include <stdio.h> void hello( void ) { hello() { "hello

" ); printf(); }

and then compiles and runs this Haskell code:

{-# LANGUAGE ForeignFunctionInterface #-} module Main where import ccall "hello" hello :: IO () foreignccall "hello" hello ::() = hello mainhello

Sadly, however, this resulted in total system crash.

Starting point

By attaching lldb to the running process we got a tiny bit more information about the crash:

* thread #4: tid = 0x3550aa, 0x000000010b3b8226, stop reason = EXC_BAD_ACCESS (code=1, address=0x0) frame #0: 0x000000010b3b8226 -> 0x10b3b8226: addb %al, (%rax) 0x10b3b8228: addb %al, (%rax) 0x10b3b822a: addb %al, (%rax) 0x10b3b822c: addb %al, (%rax)

It turns out we have a null-pointer dereferencing here. Anybody who has spent any time debugging Intel assembly code however will realize that this particular instruction

addb %al, (%rax)

is in fact the decoding of zero:

(lldb) memory read -c 8 0x10b3b8226 0x10b3b8226: 00 00 00 00 00 00 00 00 ........

In other words, chances are good we were never meant to execute this instruction at all. Unfortunately, asking lldb for a backtrace tells us absolutely nothing new:

(lldb) bt * thread #4: tid = 0x3550aa, 0x000000010b3b8226, stop reason = EXC_BAD_ACCESS (code=1, address=0x0) * frame #0: 0x000000010b3b8226

Finding a call chain

The lack of a suitable backtrace in lldb is not surprising, since compiled Haskell code barely makes use of the system stack. Instead, the runtime maintains its own stack, and code is compiled into a continuation passing style. For example, if we have the Haskell code

functionA :: IO () functionA = do .. ; functionB ; .. functionB :: () functionB = do .. ; functionC ; .. functionC :: IO () functionC = .. crash .. main :: IO () main = functionA

and we step through the execution of this program in lldb , and we ask for a backtrace when we start executing function A all we get is

(lldb) bt * thread #1: tid = 0x379731, 0x0000000100000a20 Main`A_functionA_info, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1 * frame #0: 0x0000000100000a20 Main`A_functionA_info

with no mention of main . Similarly, the backtraces on entry to functions B and C are

* thread #1: tid = 0x379731, 0x0000000100000b90 Main`B_functionB_info, queue = 'com.apple.main-thread', stop reason = breakpoint 2.1 * frame #0: 0x0000000100000b90 Main`B_functionB_info

and

* thread #1: tid = 0x379731, 0x0000000100000c88 Main`C_functionC_info, queue = 'com.apple.main-thread', stop reason = breakpoint 3.1 * frame #0: 0x0000000100000c88 Main`C_functionC_info

none of which is particularly informative. However, stepping manually through the program we do first see function A on the (singleton) call stack, then function B, and finally function C. Thus, by the time we reach function C, we have discovered a call chain A, B, C—it’s just that it involves quite a bit of manual work.

Scripting lldb

Fortunately, lldb can be scripted (see Using Scripting and Python to Debug in LLDB and the LLDB Python Reference). What we want to do is keep stepping through the code, showing the top-level (and quite possibly only) function at the top of the call stack at each step, until we crash.

We can use the following Python script to do this:

import lldb lldb def step_func(debugger, command, result, internal_dict): step_func(debugger, command, result, internal_dict): = debugger.GetSelectedTarget().GetProcess().GetSelectedThread() threaddebugger.GetSelectedTarget().GetProcess().GetSelectedThread() while True : thread.StepOver() = lldb.SBStream() streamlldb.SBStream() thread.GetStatus(stream) = stream.GetData() descriptionstream.GetData() print description description if thread.GetStopReason() == lldb.eStopReasonException: thread.GetStopReason()lldb.eStopReasonException: break def __lldb_init_module(debugger, dict ): __lldb_init_module(debugger,): 'command script add -f %s .step_func sf' % __name__ ) debugger.HandleCommand(

For the above example, we might use this as follows: we load our application into lldb

# lldb Main Current executable set to 'Main' (x86_64).

register our new command sf

(lldb) command script import mystep.py

set a breakpoint where we want to start stepping

(lldb) breakpoint set -n A_functionA_info Breakpoint 1: where = Main`A_functionA_info, address = 0x0000000100000b90

run to the breakpoint:

(lldb) run Process 54082 launched: 'Main' (x86_64) Process 54082 stopped * thread #1: tid = 0x384510, 0x0000000100000b90 Main`A_functionA_info, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1 frame #0: 0x0000000100000b90 Main`A_functionA_info

and then use sf to find a call-trace until we crash:

(lldb) sf ... * thread #1: tid = 0x384510, 0x0000000100000bf0 Main`B_functionB_info, queue = 'com.apple.main-thread', stop reason = instruction step over frame #0: 0x0000000100000bf0 Main`B_functionB_info Main`B_functionB_info: ... * thread #1: tid = 0x384510, 0x0000000100000c78 Main`C_functionC_info, queue = 'com.apple.main-thread', stop reason = instruction step over frame #0: 0x0000000100000c78 Main`C_functionC_info Main`C_functionC_info: ... * thread #1: tid = 0x384510, 0x0000000100000d20 Main`crash + 16 at crash.c:3, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x0) frame #0: 0x0000000100000d20 Main`crash + 16 at crash.c:3

Note that if you are using the threaded runtime, you may have to select which thread you want to step through before calling sf :

(lldb) thread select 4 (lldb) sf

Tweaking the script

You will probably want to tweak the above script in various ways. For instance, in the application I was debugging, I wanted to step into each assembly language instruction but over each C function call, mostly because lldb was getting confused with the call stack. I also added a maximum step count:

def step_func(debugger, command, result, internal_dict): step_func(debugger, command, result, internal_dict): = shlex.split(command) argsshlex.split(command) if len (args) > 0 : (args) = int (args[ 0 ]) maxCount(args[]) else : = 100 maxCount = debugger.GetSelectedTarget().GetProcess().GetSelectedThread() threaddebugger.GetSelectedTarget().GetProcess().GetSelectedThread() i = 0 ; while True : = thread.GetFrameAtIndex( 0 ) framethread.GetFrameAtIndex( file = frame.GetLineEntry().GetFileSpec().GetFilename() frame.GetLineEntry().GetFileSpec().GetFilename() = type ( file ) is str and file .endswith( ".c" ) inC.endswith( if inC: inC: thread.StepOver() else : False ) thread.StepInstruction( = lldb.SBStream() streamlldb.SBStream() thread.GetStatus(stream) = stream.GetData() descriptionstream.GetData() print i print description description i += 1 ; if thread.GetStopReason() == lldb.eStopReasonException or i > maxCount: thread.GetStopReason()lldb.eStopReasonExceptionmaxCount: break

You may want to tweak this step into/step over behaviour to suit your application; certainly you don’t want to have a call trace involving every step taken in the Haskell RTS or worse, in the libraries it depends on.

Back to the example

Rather than printing every step along the way, it may also be useful to simply remember the step-before-last and show that on a crash; often it is sufficient to know what happened just before the crash. Indeed, in the application I was debugging the call stack just before the crash was:

2583 thread #3: tid = 0x35e743, 0x00000001099da56d libHSrts_thr_debug-ghc7.8.3.20140729.dylib`schedule(initialCapability=0x0000000109a2f840, task=0x00007fadda404550) + 1533 at Schedule.c:470, stop reason = step over frame #0: 0x00000001099da56d libHSrts_thr_debug-ghc7.8.3.20140729.dylib`schedule(initialCapability=0x0000000109a2f840, task=0x00007fadda404550) + 1533 at Schedule.c:470 467 } 468 469 case ThreadInterpret: -> 470 cap = interpretBCO(cap); 471 ret = cap->r.rRet; 472 break; 473 2584 thread #3: tid = 0x35e743, 0x0000000103106226, stop reason = EXC_BAD_ACCESS (code=1, address=0x0) frame #0: 0x0000000103106226 -> 0x103106226: addb %al, (%rax) 0x103106228: addb %al, (%rax) 0x10310622a: addb %al, (%rax) 0x10310622c: addb %al, (%rax)

Which is a lot more helpful than the backtrace, as we now have a starting point: something went wrong when running the bytecode interpreter (remember that the application was compiling and running some Haskell code at runtime).

To pinpoint the problem further, we can set a breakpoint in interpretBCO and run sf again (the way we defined sf it steps over any C function calls by default). This time we get to:

4272 thread #4: tid = 0x35f43a, 0x000000010e77c548 libHSrts_thr_debug-ghc7.8.3.20140729.dylib`interpretBCO(cap=0x000000010e7e7840) + 18584 at Interpreter.c:1463, stop reason = step over frame #0: 0x000000010e77c548 libHSrts_thr_debug-ghc7.8.3.20140729.dylib`interpretBCO(cap=0x000000010e7e7840) + 18584 at Interpreter.c:1463 1460 tok = suspendThread(&cap->r, interruptible ? rtsTrue : rtsFalse); 1461 1462 // We already made a copy of the arguments above. -> 1463 ffi_call(cif, fn, ret, argptrs); 1464 1465 // And restart the thread again, popping the stg_ret_p frame. 1466 cap = (Capability *)((void *)((unsigned char*)resumeThread(tok) - STG_FIELD_OFFSET(Capability,r))); 4273 thread #4: tid = 0x35f43a, 0x0000000107eba226, stop reason = EXC_BAD_ACCESS (code=1, address=0x0) frame #0: 0x0000000107eba226 -> 0x107eba226: addb %al, (%rax) 0x107eba228: addb %al, (%rax) 0x107eba22a: addb %al, (%rax) 0x107eba22c: addb %al, (%rax)

Ok, now we are really getting somewhere. Something is going wrong when are doing a foreign function call. Let’s re-run the application once more, setting a breakpoint at ffi_call :

(lldb) breakpoint set -n ffi_call Breakpoint 1: where = libffi.dylib`ffi_call + 29 at ffi64.c:421, address = 0x0000000108e098dd (lldb) cont Process 51476 resuming Process 51476 stopped * thread #4: tid = 0x360fd3, 0x0000000108e098dd libffi.dylib`ffi_call(cif=0x00007fb262f00000, fn=0x00000001024a4210, rvalue=0x000000010b786ac0, avalue=0x000000010b7868a0) + 29 at ffi64.c:421, stop reason = breakpoint 1.1 frame #0: 0x0000000108e098dd libffi.dylib`ffi_call(cif=0x00007fb262f00000, fn=0x00000001024a4210, rvalue=0x000000010b786ac0, avalue=0x000000010b7868a0) + 29 at ffi64.c:421 418 /* If the return value is a struct and we don't have a return value 419 address then we need to make one. Note the setting of flags to 420 VOID above in ffi_prep_cif_machdep. */ -> 421 ret_in_memory = (cif->rtype->type == FFI_TYPE_STRUCT 422 && (cif->flags & 0xff) == FFI_TYPE_VOID); 423 if (rvalue == NULL && ret_in_memory) 424 rvalue = alloca (cif->rtype->size);

and let’s take a look at the function we’re about to execute:

(lldb) disassemble -s fn 0x1024a4210: pushq %rbp 0x1024a4211: movq %rsp, %rbp 0x1024a4214: leaq (%rip), %rdi 0x1024a421b: popq %rbp 0x1024a421c: jmpq 0x1024a4221 0x1024a4221: pushq $0x6f6c6c65 0x1024a4226: addb %al, (%rax) 0x1024a4228: addb %al, (%rax) 0x1024a422a: addb %al, (%rax) 0x1024a422c: addb %al, (%rax) 0x1024a422e: addb %al, (%rax)

We were expecting to execute hello :

# otool -tV hello.o hello.o: (__TEXT,__text) section _hello: 0000000000000000 pushq %rbp 0000000000000001 movq %rsp, %rbp 0000000000000004 leaq L_str(%rip), %rdi ## literal pool for: "hello" 000000000000000b popq %rbp 000000000000000c jmpq _puts

and if you compare this with the code loaded into memory it all becomes clear. The jump instruction in the object file

jmpq _puts

contains a symbolic reference to puts ; but the jump in the code that we are about to execute in fact jumps to the next instruction in memory:

0x1024a421c: jmpq 0x1024a4221 jmpq 0x1024a4221: ... ...

In other words, the loaded object file has not been properly relocated, and when we try to call puts we end up jumping into nowhere. At this point the bug was easily resolved.

Further Reading