When debugging performance issues, I usually rely on the good old line_profiler. It’s very useful to easily identify which lines of a specific function are slow and needs to be investigated and/or fixed.

Using it is straightforward, you basically add the @profile decorator on the function you want to profile:

@profile def slow_function ( a , b , c ): ...

Then you launch your script with:

$ kernprof -l script_to_profile.py

And generate the report with:

$ python -m line_profiler script_to_profile.py.lprof Timer unit: 1e-06 s File: pystone.py Function: Proc2 at line 149 Total time : 0.606656 s Line # Hits Time Per Hit % Time Line Contents ============================================================== 149 @profile 150 def Proc2 ( IntParIO ) : 151 50000 82003 1.6 13.5 IntLoc = IntParIO + 10 152 50000 63162 1.3 10.4 while 1: 153 50000 69065 1.4 11.4 if Char1Glob == 'A' : 154 50000 66354 1.3 10.9 IntLoc = IntLoc - 1 155 50000 67263 1.3 11.1 IntParIO = IntLoc - IntGlob 156 50000 65494 1.3 10.8 EnumLoc = Ident1 157 50000 68001 1.4 11.2 if EnumLoc == Ident1: 158 50000 63739 1.3 10.5 break 159 50000 61575 1.2 10.1 return IntParIO

Which is great… until it’s not anymore.

Too much magic

Line profiler is great for profiling small scripts, you add the decorator, you run it with kernprof and voila. The problem is that you always need to launch it with kernprof as it’s injecting the profile decorator. If you try to launch your script without kernprof you will have a nice NameError: name 'profile' is not defined .

Moreover, in some cases, you just cannot use kernprof , for example when trying to profile a web server or when the Python interpreter is launched by a bash/script or another process you cannot modify.

Luckily for us, it’s not that hard to use line_profiler without kernprof .

The magic trick

The kernprof magic trick is not that complicated as you’ll see.

First it instantiate the right object:

import line_profiler prof = line_profiler . LineProfiler ()

Then it inject it in the builtins:

builtins . __dict__ [ 'profile' ] = prof

Execute the Python script:

execfile ( script_file , ns , ns )

And finally save the stats and print them if needed with:

prof . print_stats ()

No Rocket science involved here. With all these information we can now use it manually.

Manual use

Let’s use a simple Python script for showing you how to use it manually. The following script answer this exercise:

Given a list of integers and a target integer, the function should answer True if the target could be created by adding exactly two integers from the list, False if not.

Here is a naive solution:

def is_addable ( l , t ): for i , n in enumerate ( l ): for m in l [ i :]: if n + m == t : return True return False assert is_addable ( range ( 20 ), 25 ) == True # 25 = 6 + 19 assert is_addable ( range ( 20 ), 40 ) == False

The goal is to optimize this simple function. Let’s create a line_profiler and decorate our function:

import line_profiler profile = line_profiler . LineProfiler () @profile def is_addable ( l , t ): for i , n in enumerate ( l ): for m in l [ i :]: if n + m == t : return True return False assert is_addable ( range ( 20 ), 25 ) == True assert is_addable ( range ( 20 ), 40 ) == False

Launch the script, it should run a bit slower, it’s normal as the script is now profiled. But you don’t have the report either.

That’s normal, we didn’t call the print_stats function. But we need to call it at the end of the script. We could manually call it at the end of the script, but in some cases, it would be tedious to add it manually.

Instead, we can use the atexit module to call it for us at the end of the current Python process:

import line_profiler import atexit profile = line_profiler . LineProfiler () atexit . register ( profile . print_stats ) @profile def is_addable ( l , t ): for i , n in enumerate ( l ): for m in l [ i :]: if n + m == t : return True return False assert is_addable ( range ( 20 ), 25 ) == True assert is_addable ( range ( 20 ), 40 ) == False

Now let’s run the script once again:

$ python script.py Timer unit: 1e-06 s Total time : 0.000171 s File: script.py Function: is_addable at line 6 Line # Hits Time Per Hit % Time Line Contents ============================================================== 6 @profile 7 def is_addable ( l, t ) : 8 28 12.0 0.4 7.0 for i, n in enumerate ( l ) : 9 355 70.0 0.2 40.9 for m in l[i:]: 10 329 87.0 0.3 50.9 if n + m == t: 11 1 1.0 1.0 0.6 return True 12 13 1 1.0 1.0 0.6 return False

Hey much better! Optimizing the function is left as an exercise for the reader.