Author: Gaël Varoquaux

This section explores tools to understand better your code base: debugging, to find and fix bugs.

It is not specific to the scientific Python community, but the strategies that we will employ are tailored to its needs.

Use the flymake mode with pyflakes, documented on https://www.emacswiki.org/emacs/FlyMake and included in Emacs 26 and more recent. To activate it, use M-x (meta-key then x) and enter flymake-mode at the prompt. To enable it automatically when opening a Python file, add the following line to your .emacs file:

Alternatively: use the syntastic plugin. This can be configured to use flake8 too and also handles on-the-fly checking for many other languages.

In emacs In your .emacs (binds F5 to pyflakes ):

In vim In your .vimrc (binds F5 to pyflakes ):

Then Ctrl-Shift-V is binded to a pyflakes report

You can bind a key to run pyflakes in the current buffer.

Integrating pyflakes (or flake8) in your editor or IDE is highly recommended, it does yield productivity gains .

Another good recommendation is the flake8 tool which is a combination of pyflakes and pep8. Thus, in addition to the types of errors that pyflakes catches, flake8 detects violations of the recommendation in PEP8 style guide.

Here we focus on pyflakes , which is the simplest tool.

They are several static analysis tools in Python; to name a few:

“Everyone knows that debugging is twice as hard as writing a program in the first place. So if you’re as clever as you can be when you write it, how will you ever debug it?”

Once you have gone through this process: isolated a tight piece of code reproducing the bug and fix the bug using this piece of code, add the corresponding code to your test suite.

Take notes and be patient. It may take a while.

Use the debugger to understand what is going wrong.

Change one thing at a time and re-run the failing test case.

Divide and Conquer. Once you have a failing test case, isolate the failing code.

Make it fail reliably. Find a test case that makes the code fail every time.

For debugging a given problem, the favorable situation is when the problem is isolated in a small number of lines of code, outside framework or application code, with short modify-run-fail cycles

If you do have a non trivial bug, this is when debugging strategies kick in. There is no silver bullet. Yet, strategies help:

Type h or help to access the interactive help:

You cannot name the variables the way you want. For instance, if in you cannot override the variables in the current frame with the same name: use different names than your local variable when typing code in the debugger .

When running nosetests , the output is captured, and thus it seems that the debugger does not work. Simply run the nosetests with the -s flag.

Insert the following line where you want to drop in the debugger:

In addition, you can use the IPython interface for the debugger in nose by installing the nose plugin ipdbplugin . You can than pass --ipdb and --ipdb-failure options to nosetests.

You can run nosetests --pdb to drop in post-mortem debugging on exceptions, and nosetests --pdb-failure to inspect test failures using the debugger.

If you find it tedious to note the line number to set a break point, you can simply raise an exception at the point that you want to inspect and use IPython’s %debug . Note that in this case you cannot step or continue the execution.

FloatingPointError: divide by zero encountered in divide

We can turn these warnings in exception, which enables us to do post-mortem debugging on them, and find our problem more quickly:

wiener_filtering.py:40: RuntimeWarning: divide by zero encountered in divide

When we run the wiener_filtering.py file, the following warnings are raised:

Oh dear, nothing but integers, and 0 variation. Here is our bug, we are doing integer arithmetic.

Step a few lines and explore the local variables:

Step into code with n(ext) and s(tep) : next jumps to the next statement in the current execution context, while step will go across execution contexts, i.e. enable exploring inside function calls:

Continue execution to next breakpoint with c(ont(inue)) :

NOTE: Enter 'c' at the ipdb> prompt to start your script.

Run the script in IPython with the debugger using %run -d wiener_filtering.p :

For instance we are trying to debug wiener_filtering.py . Indeed the code runs, but the filtering does not work well.

Situation : You believe a bug exists in a module but are not sure where.

In some situations you cannot use IPython, for instance to debug a script that wants to be called from the command line. In this case, you can call the script with python -m pdb script.py :

Here we debug the file index_error.py . When running it, an IndexError is raised. Type %debug and drop into the debugger.

Situation : You’re working in IPython and you get a traceback.

Yes, print statements do work as a debugging tool. However to inspect runtime, it is often more efficient to use the debugger.

Specifically it allows you to:

The python debugger, pdb : https://docs.python.org/library/pdb.html , allows you to inspect your code interactively.

If you have a segmentation fault, you cannot debug it with pdb, as it crashes the Python interpreter before it can drop in the debugger. Similarly, if you have a bug in C code embedded in Python, pdb is useless. For this we turn to the gnu debugger, gdb, available on Linux.

Before we start with gdb, let us add a few Python-specific tools to it. For this we add a few macros to our ~/.gdbinit . The optimal choice of macro depends on your Python version and your gdb version. I have added a simplified version in gdbinit , but feel free to read DebuggingWithGdb.

To debug with gdb the Python script segfault.py , we can run the script in gdb as follows

$ gdb python ... (gdb) run segfault.py Starting program: /usr/bin/python segfault.py [Thread debugging using libthread_db enabled] Program received signal SIGSEGV, Segmentation fault. _strided_byte_copy (dst=0x8537478 "\360\343G", outstrides=4, src= 0x86c0690 <Address 0x86c0690 out of bounds>, instrides=32, N=3, elsize=4) at numpy/core/src/multiarray/ctors.c:365 365 _FAST_MOVE(Int32); (gdb)

We get a segfault, and gdb captures it for post-mortem debugging in the C level stack (not the Python call stack). We can debug the C call stack using gdb’s commands:

(gdb) up # 1 0x004af4f5 in _copy_from_same_shape ( dest = <value optimized out>, src=<value optimized out>, myfunc=0x496780 <_strided_byte_copy>, swap=0) at numpy/core/src/multiarray/ctors.c:748 748 myfunc(dit->dataptr, dest->strides[maxaxis],

As you can see, right now, we are in the C code of numpy. We would like to know what is the Python code that triggers this segfault, so we go up the stack until we hit the Python execution loop:

(gdb) up # 8 0x080ddd23 in call_function ( f = Frame 0x85371ec, for file /home/varoquau/usr/lib/python2.6/site-packages/numpy/core/arrayprint.py, line 156, in _leading_trailing (a=<numpy.ndarray at remote 0x85371b0>, _nc=<module at remote 0xb7f93a64>), throwflag=0) at ../Python/ceval.c:3750 3750 ../Python/ceval.c: No such file or directory. in ../Python/ceval.c (gdb) up # 9 PyEval_EvalFrameEx ( f = Frame 0x85371ec, for file /home/varoquau/usr/lib/python2.6/site-packages/numpy/core/arrayprint.py, line 156, in _leading_trailing (a=<numpy.ndarray at remote 0x85371b0>, _nc=<module at remote 0xb7f93a64>), throwflag=0) at ../Python/ceval.c:2412 2412 in ../Python/ceval.c (gdb)

Once we are in the Python execution loop, we can use our special Python helper function. For instance we can find the corresponding Python code:

(gdb) pyframe /home/varoquau/usr/lib/python2.6/site-packages/numpy/core/arrayprint.py (158): _leading_trailing (gdb)

This is numpy code, we need to go up until we find code that we have written:

(gdb) up ... (gdb) up # 34 0x080dc97a in PyEval_EvalFrameEx ( f = Frame 0x82f064c, for file segfault.py, line 11, in print_big_array (small_array=<numpy.ndarray at remote 0x853ecf0>, big_array=<numpy.ndarray at remote 0x853ed20>), throwflag=0) at ../Python/ceval.c:1630 1630 ../Python/ceval.c: No such file or directory. in ../Python/ceval.c (gdb) pyframe segfault.py (12): print_big_array

The corresponding code is:

def make_big_array ( small_array ): big_array = stride_tricks . as_strided ( small_array , shape = ( 2e6 , 2e6 ), strides = ( 32 , 32 )) return big_array def print_big_array ( small_array ): big_array = make_big_array ( small_array )

Thus the segfault happens when printing big_array[-10:] . The reason is simply that big_array has been allocated with its end outside the program memory.

Note For a list of Python-specific commands defined in the gdbinit , read the source of this file.