I’ve recently been looking into ways to improve my debugging experience with mixed Python and C/C++ programs. I spend a fair amount of time working on systems built using both languages in tandem, and the tools available for debugging across the languages have historically been very limited. Often, logging and/or intimate knowledge of the Python C-API (and, in my case, boost.python) were the primary tools available.

I was naturally very excited, then, when GDB 7 introduced support for extension via Python. This seemed like an obvious step towards a “unified” debugging environment, one where I could step naturally between the languages, set breakpoints, etc. This new GDB feature doesn’t solve the problem directly, but it opens the door for more sophisticated extensions to GDB than were previously practical.

Building on this new GDB feature, recent Python source releases include GDB extensions as a build product. That is, starting at around python-3.2, when you build Python from source one of the build products is a set of GDB extensions that make it possible to do very natural debugging of Python from inside GDB. These extensions allow you, for example, to step up and down the Python call stack, display frame information (locals), and so forth. And this can be done while also navigating the base C call stack. This provides a very intuitive and productive debugging environment for anyone working across the two runtime environment.

edit: It was pointed out on reddit that gdb’s embedded Python interpreter has been around since 2009 and is thus not particularly new.

GDB 7’s Python extension system

What exactly is GDB’s new Python extension system? Broadly speaking, it’s a way to script GDB using Python rather than GDB’s own internal language. From around version 7 onward, GDB embeds a Python interpreter that you can invoke from the prompt and from scripts. For example:

(gdb) python >import sys

To support writing extensions, you can import the module “gdb” from this embedded interpreter. This module provides access and hooks into GDB so that you can control it via Python scripts. You can read all about this Python extension support in GDB’s documentation.

This new facility has been used to implement, for example, pretty printers for STL containers. But in this article we’re interested in improving the CPython debugging experience, so we’ll cover that in the next section.

Python’s GDB extensions

As previously mentioned, recent Python source releases (from about 3.2 onward) include a new build product: a GDB Python extension that helps you debug Python. What does this mean? The extension encodes knowledge about CPython’s actual C-level structures, function frame implementation, PyObject, and so forth. It hooks this knowledge into GDB so that you can interrogate the CPython runtime at the Python source level rather than the CPython level.

For example, here are some partial stack traces from a break point inside a C extension module. The first one is a raw stack trace showing the guts of CPython execution:

(gdb) bt #0 get_value () at some_lib.c:1 #1 0x00007ffff686348c in ffi_call_unix64 () at /home/abingham/src/Python-3.2/Modules/_ctypes/libffi/src/x86/unix64.S:75 #2 0x00007ffff6862bc3 in ffi_call (cif=0x7fffffffde40, fn=, rvalue=, avalue=) at /home/abingham/src/Python-3.2/Modules/_ctypes/libffi/src/x86/ffi64.c:485 #3 0x00007ffff68574ec in _call_function_pointer (pProc=, argtuple=, flags=, argtypes=, restype=, checker=) at /home/abingham/src/Python-3.2/Modules/_ctypes/callproc.c:808 #4 _ctypes_callproc (pProc=, argtuple=, flags=, argtypes=, restype=, checker=) at /home/abingham/src/Python-3.2/Modules/_ctypes/callproc.c:1151 #5 0x00007ffff684f823 in PyCFuncPtr_call (self=0x7ffff7e3a600, inargs=0x7ffff7f87050, kwds=) at /home/abingham/src/Python-3.2/Modules/_ctypes/_ctypes.c:3766 #6 0x00000000004db737 in PyObject_Call (func=0x7ffff7e3a600, arg=0x506417, kw=0x4) at Objects/abstract.c:2149 #7 0x00000000004658d4 in do_call (f=0x987d20, throwflag=) at Python/ceval.c:4095

The next one shows the Python debugging extension getting loaded, followed by a new stack trace. This new trace is still at the C source level, but you’ll notice that many of the CPython calls are annotated with Python source level information (e.g. file names, line numbers, etc.):

(gdb) python >import python3_2_gdb >(gdb) bt #0 get_value () at some_lib.c:1 #1 0x00007ffff686348c in ffi_call_unix64 () at /home/abingham/src/Python-3.2/Modules/_ctypes/libffi/src/x86/unix64.S:75 #2 0x00007ffff6862bc3 in ffi_call (cif=0x7fffffffde40, fn=, rvalue=, avalue=) at /home/abingham/src/Python-3.2/Modules/_ctypes/libffi/src/x86/ffi64.c:485 #3 0x00007ffff68574ec in _call_function_pointer (pProc=, argtuple=, flags=, argtypes=, restype=, checker=) at /home/abingham/src/Python-3.2/Modules/_ctypes/callproc.c:808 #4 _ctypes_callproc (pProc=, argtuple=, flags=, argtypes=, restype=, checker=) at /home/abingham/src/Python-3.2/Modules/_ctypes/callproc.c:1151 #5 0x00007ffff684f823 in PyCFuncPtr_call (self=0x7ffff7e3a600, inargs=(), kwds=) at /home/abingham/src/Python-3.2/Modules/_ctypes/_ctypes.c:3766 #6 0x00000000004db737 in PyObject_Call (func=, arg=, kw=) at Objects/abstract.c:2149 #7 0x00000000004658d4 in do_call (f= Frame 0x987d20, for file bar.py, line 5, in llama (lib=<CDLL(_FuncPtr=, get_value=, _handle=9720816, _name='libsome_lib.so') at remote 0x7ffff6ac53d0>), throwflag=) at Python/ceval.c:4095 #8 call_function (f= Frame 0x987d20, for file bar.py, line 5, in llama (lib=<CDLL(_FuncPtr=, get_value=, _handle=9720816, _name='libsome_lib.so') at remote 0x7ffff6ac53d0>), throwflag=) at Python/ceval.c:3898 #9 PyEval_EvalFrameEx (f= Frame 0x987d20, for file bar.py, line 5, in llama (lib=<CDLL(_FuncPtr=, get_value=, _handle=9720816, _name='libsome_lib.so') at remote 0x7ffff6ac53d0>), throwflag=) at Python/ceval.c:2673

In particular, notice that the “python” command changes the GDB prompt to “>”. This is how you can tell when you’re in python mode. To exit python mode, you need to use Ctrl-D (on windows this might be Ctrl-Z…I haven’t checked.)

This final stack trace shows a Python source level stack trace provided by the extension:

(gdb) py-bt #9 (unable to read python frame information) #12 Frame 0x987d20, for file bar.py, line 5, in llama (lib=<CDLL(_FuncPtr=, get_value=, _handle=9720816, _name='libsome_lib.so') at remote 0x7ffff6ac53d0>) print(lib.get_value()) #15 Frame 0x98a700, for file bar.py, line 9, in baz () llama()

It’s important to recognize that this was all done in the same GDB session, at the same break point, etc. What you can see is that the Python-provided extension gives you a very clear picture of the exact state of the Python stack along with the underlying C stack. If you’ve ever tried debugging C extensions through GDB, you’ll immediately appreciate the clarity this provides.

Other extension features

The CPython GDB extension provides more than just stack traces and frame annotation. You can also use it to navigate the Python stack:

(gdb) py-up #12 Frame 0x987d20, for file bar.py, line 5, in llama (lib=<CDLL(_FuncPtr=, get_value=, _handle=9720816, _name='libsome_lib.so') at remote 0x7ffff6ac53d0>) print(lib.get_value()) (gdb) py-up #15 Frame 0x98a700, for file bar.py, line 9, in baz () llama() (gdb) py-down #12 Frame 0x987d20, for file bar.py, line 5, in llama (lib=<CDLL(_FuncPtr=, get_value=, _handle=9720816, _name='libsome_lib.so') at remote 0x7ffff6ac53d0>) print(lib.get_value())

examine locals:

(gdb) py-locals lib = <CDLL(_FuncPtr=, get_value=, _handle=9720816, _name='libsome_lib.so') at remote 0x7ffff6ac53d0>

list source code:

(gdb) py-list 1 import ctypes 2 3 def llama(): 4 lib = ctypes.cdll.LoadLibrary('libsome_lib.so') 5 print(lib.get_value()) 6 7 def baz(): 8 print('baz') 9 llama() 10

and print individual variables:

(gdb) py-print lib local 'lib' = <CDLL(_FuncPtr=, get_value=, _handle=9720816, _name='libsome_lib.so') at remote 0x7ffff6ac53d0>

Limitations

The CPython extension doesn’t (yet?) offer a complete debugging solution. For example, I haven’t found any way to set a break point in Python code. Likewise, you can’t set watches on Python variables. These features may not even be feasible with the GDB architecture. Nevertheless, the extension is a powerful tool for most debugging needs.

Installation

To install the CPython GDB extension, you need to do the following:

Compile Python from source. This will produce the file “python-gdb.py” at the root of the Python source tree. Tell GDB where to find extension modules. Put the generated extension from step 1 where GDB can find it.

To do step 1, simply compile Python as you normally would. This is covered extensively by Python, so I won’t go into the details here.

For step 2, you need to pick a place (or places) for GDB extension modules to live. I just created the directory “~/.gdb” for this purpose. To tell GDB to look here for extensions, what you really need to do is extend sys.path in GDB’s embedded Python interpreter. For example, I have this code in my .gdbinit:

python import sys sys.path.insert(0, '/home/abingham/.gdb') end

This effectively puts GDB int python mode and then extends sys.path to look in ~/.gdb.

Finally, to activate the extension module – that is, to make it available in a particular GDB session – you need to import it. You do that like this:

(gdb) Python >import python3_2_gdb >(gdb)

(Again, remember that you exit python mode and return to the GDB prompt with Ctrl-D.)

Once you do this, all of the extension commands (py-list, py-up, etc.) and the frame annotations are activated.

When to load the module

There is one important point to remember with regard to the timing of the extension import. If you attempt to load the extension before the corresponding Python library has been loaded into memory, you will get errors like this:

(gdb) python >import python3_2_gdb >Traceback (most recent call last): File "", line 1, in File "/home/abingham/.gdb/python3_2_gdb.py", line 52, in _type_size_t = gdb.lookup_type('size_t') RuntimeError: No type named size_t. Error while executing Python code.

Because of this, you generally can not import the CPython extension module in your standard .gdbinit. Rather, you need to load it manually when you need it. In practice this is just fine since you will probably have various versions of the extension that you need at different times (see next section.)

Dealing with multiple Python versions

It’s important to remember that different variants of the Python build will generate different extensions. This is because the variants (e.g. debug and non-debug) will, in general, have different C-level structure sizes. As a result, an extension built for debug Python will not work (or may work incorrectly) for release versions of Python. One way to cope with this is simply to name the different versions of the extension differently. For example, the extension for the normal build of Python 3.2 might be “python3_2_gdb.py” while the debug version might be “python3_2_debug_gdb.py”. It doesn’t really matter what naming scheme you use, so just find one that maps to your personal taste.