Finding performance bottlenecks in Python

What is making your code so slow?

In this post I’m going to explain how to analyze a fragment of code and possibly discover performance bottlenecks.

Let’s first create our bottleneck

Let’s assume that we need to create a script that iterates over a loop, and each loop execution produces a value that depends on the current iteration and an independent value. The independent value is part of an initial setup, which can be a relatively heavy function:

import time def initial_setup():

a = 7

time.sleep(1)

return a

Our “initial_setup” function is simply returning a fixed value, but I’ve added a time.sleep() call of 1 second to simulate a heavier function. Let’s move on and build the main function of our script, which we’ll casually call “slow_function”.

def slow_function():

x = range(5)

for i in x:

a = initial_setup()

b = a + i

print(b)

Our new function is simply iterating over [0, 4] and adding the current number to the fixed number returned in the initial setup. You may have noticed that I have, mistakenly, called “initial_setup()” inside the loop block. Even though the result will be correct, it’s still a performance error, given that “initial_setup()” does not depend on the loop iteration and, moreover, it’s a heavy function.

Anyway, let’s pretend that we haven’t noticed, and test our solution. You should be getting the following result:

7

8

9

10

11

It’s working, right? Although it has taken longer than I expected… I can’t figure out what’s making it so slow, so maybe it’s time to use Python’s profiler…

Getting started with cProfile

We need to profile our function and find out what’s hindering its performance! Fortunately, Python offers a module that can help us in this task: cProfile. We’ll start by creating an instance of the Profile class:

import cProfile profile = cProfile.Profile()

We are now ready to profile our function! We will start using a simple approach, using the runcall() method, which receives the function (and its arguments, if any) to be profiled. In our case, we want to profile “slow_function()”, so we’ll add the following instruction:

import cProfile profile = cProfile.Profile()

profile.runcall(slow_function)

Now, if you run the above snippet, you’ll see that, apparently, nothing has happened. At least, no profiling information has been printed in stdout. That’s because we are missing the last piece of the puzzle: pstats. Let’s go ahead and add it to our snippet:

import cProfile

import pstats profile = cProfile.Profile()

profile.runcall(slow_function)

ps = pstats.Stats(profile)

ps.print_stats()

As you can see, we are passing the “profile” instance to the constructor of Stats to create a new instance “ps” of this class. This instance finally allows us to print the profiling results.

Alright, enough talking (or writing), let’s run our snippet and see what we get!

7

8

9

10

11

17 function calls in 5.011 seconds Random listing order was used ncalls tottime percall cumtime percall filename:lineno(function)

5 0.001 0.000 0.001 0.000 {built-in method builtins.print}

5 5.010 1.002 5.010 1.002 {built-in method time.sleep}

5 0.000 0.000 5.010 1.002 python-performance-bottlenecks.py:5(initial_setup)

1 0.000 0.000 5.011 5.011 python-performance-bottlenecks.py:10(slow_function)

1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}

How can we interpret the results of print_stats()?

Good! We can now see a bunch of information that should point us towards the performance bottleneck. In order to find it, we should first understand what each of the columns means.

ncalls: the number of times that the analyzed function has been called

tottime: the total execution time spent in the analyzed function (excluding the execution time of the subfunctions)

percall: tottime divided by ncalls

cumtime: the total execution time spent in the analyzed function (including the execution time of the subfunctions)

percall: cumtime divided by ncalls

filename:lineno(function): file, line number and analyzed function

Now that we understand the meaning of each column, we may look for the reason that was hindering our performance. Hmm… Oh! “initial_setup()” was called 5 times. In fact, the cumulative execution time is above 5 seconds! Well, it makes sense, as I have, mistakenly, called “initial_setup()” inside the for loop! Alright, I’ll just call it outside of the loop, as it does not depend on it.

So we’ve solved our first performance issue! Now, how will we solve our future bottlenecks? Should we always be looking at ncalls? Well, sometimes. But sometimes, we’ll face another kind of issue, in which ncalls won’t tell us anything. For instance, we may have to deal with a function that is called only once, but is extremely slow. As a general strategy, I tend to look at the functions with a higher cumulative time (cumtime) first and then check whether any function stands out in ncalls. However, I’d say that choosing a strategy is quite a personal journey that we build based on our experiences.

A more flexible approach on profiling

Sometimes you may not want to profile a whole function, but just a part of it. This is also possible, and easy, in the following way:

def slow_function():

profile = cProfile.Profile()

profile.enable()

x = range(5)

for i in x:

a = initial_setup()

b = a + i

print(b)

profile.disable()

ps = pstats.Stats(profile)

ps.print_stats()

We will have to place the fragment of code that we want to analyze between “profile.enable()” and “profile.disable()”. Keep in mind that, in this case, we are analyzing a toy example, but in real situations you may have to analyze a larger function, so it’s always useful to limit the scope of the analyzed fragment of code if you have a suspicion of where could the bottleneck be placed.

Moreover, starting from Python 3.8, you also have the option of using it as a context manager, limiting the scope of the “profile” instance to the enclosed block:

with cProfile.Profile() as profile:

x = range(5)

for i in x:

a = initial_setup()

b = a + i

print(b)

ps = pstats.Stats(profile)

ps.print_stats()

Getting the most of pstats

Once you start profiling in real-world codebases, you’ll see that the printed statistics will be much larger, as you’ll probably be profiling larger fragments of code. Well, at some point, there’s so much information that it gets a bit confusing. Fortunately, pstats can help you make this information more manageable.

For starters, you may use sort_stats() to determine the order of the printed functions (you may also find in the link the available ordering keys). It may help, for instance, to sort it by number of calls or cumulative time. You may define more than one sorting key, defining their priority in the order of the keys. Another handy option is the possibility of setting a limit on the number of printed functions. You may do it by passing an integer to print_stats(). Let’s modify our example to sort the output by ncalls and cumtime, in this order, and limit it to 3 functions:

def slow_function():

profile = cProfile.Profile()

profile.enable()

x = range(5)

for i in x:

a = initial_setup()

b = a + i

print(b)

profile.disable()

ps = pstats.Stats(profile)

ps.sort_stats('calls', 'cumtime')

ps.print_stats(3)

Your output should look like the following:

7

8

9

10

11

17 function calls in 5.017 seconds Ordered by: call count, cumulative time

List reduced from 5 to 3 due to restriction <3> ncalls tottime percall cumtime percall filename:lineno(function)

5 0.000 0.000 5.016 1.003 python-performance-bottlenecks.py:5(initial_setup)

5 5.016 1.003 5.016 1.003 {built-in method time.sleep}

5 0.001 0.000 0.001 0.000 {built-in method builtins.print}

As you can see, both the defined ordering and limit have been applied. Also note that these restrictions are declared in the output.

Well, that’s it, you are now ready to make your code faster!