There are many situations where we find that our code runs too slow and we don’t know the apparent reason. For such situations it comes very handy to use the python cProfile module. The module enables us to see the time individual steps in our code take, as well as the number of times certain functions are being called. In the following paragraphs, I will explore it’s capabilities a bit more.

However first let’s remember the quote by Donald Knuth: “premature optimization is the root of all evil (or at least most of it) in programming”. So make sure that you don’t start optimizing before you even have a working code! In many cases you will not be able to determine the bottlenecks beforehand and might spend a lot of extra effort in the wrong places.

Profiling with cProfile

The easiest way of using the cProfile module from within a Python script can look as follows

import cProfile pr = cProfile.Profile() pr.enable() call_function() pr.disable() pr.print_stats(sort='time')

In the code we create a Profile object and enable it, then execute the code that is of interest to us, disable the profiling and view the results.

A very simple output of such profiling looks like this:

ncalls tottime percall cumtime percall filename:lineno(function) 1 0.972 0.972 1.282 1.282 performance_optimization.py:25(calculate_occurences_list_with_from_keys) 1 0.310 0.310 0.310 0.310 {built-in method fromkeys} 1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}

The individual fields in the output are:

ncalls – how many times the function/method has been called (in case the same function/method is being called recursively then ncalls has two values eg. 120/20, where the first is the true number of calls, while the second is the number of direct calls)

– how many times the function/method has been called (in case the same function/method is being called recursively then ncalls has two values eg. 120/20, where the first is the true number of calls, while the second is the number of direct calls) tottime – the total time in seconds excluding the time of other functions/methods

– the total time in seconds excluding the time of other functions/methods percall – average time to execute function (per call)

– average time to execute function (per call) cumtime – the total time in seconds includes the times of other functions it calls

– the total time in seconds includes the times of other functions it calls percall – similar to the previous percall, however this one includes network delays thread sleeps etc…

– similar to the previous percall, however this one includes network delays thread sleeps etc… filename:lineno(function) – the line within a script or the method used

So in terms of the above example, we see three lines, the last one is the disabling of the Profiler and does not interest us any further. The first two lines tell us that we called a function calculate_occurences_list_with_from_keys , which took cumulatively 1.282 seconds. Furthermore we see that the cumulative time consisted of two parts, the first is the function call and the logic happening within the function (0.972 seconds), while the second is the call of the built-in method fromkeys (0.310 seconds) from within our function.

Profiling example

Now we have some working knowledge of cProfile, so we can start with a simple example. To test profiling we will create a list of random numbers from the range 0-99 and then calculate the number of occurrences within the given list.

The generation of the data is easy using the random module

data = [random.randint(0, 99) for p in range(0, 10000000)]

Our first approach will be to write a function that creates a dictionary with all possible values and sets the default value = 0. Then we iterate through each element within the list and increment the number of occurrences. The function looks as follows:

def calculate_occurences_list(data): result = {} for i in range(0, 100): result[i] = 0 for element in data: result[element] += 1 return result

And if we profile the call of the function with the our random data we get:

ncalls tottime percall cumtime percall filename:lineno(function) 1 0.807 0.807 0.807 0.807 performance_optimization.py:8(calculate_occurences_list) 1 0.000 0.000 0.000 0.000 {range} 1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}

Ok. In total it took us 0.807 seconds and we see that we used the range function within our code. For some reason this is too slow for us and we start thinking about ways to improve it.

Well we are generating the result dictionary by ourselves, while we could use the Python provided dict.fromkeys method and maybe the built-in method will perform better. The modified function is now:

def calculate_occurences_list_with_from_keys(data): result = dict.fromkeys(data, 0) for element in data: result[element] += 1 return result

And the profiling result now is:

ncalls tottime percall cumtime percall filename:lineno(function) 1 0.857 0.857 1.088 1.088 performance_optimization.py:20(calculate_occurences_list_with_from_keys) 1 0.231 0.231 0.231 0.231 {built-in method fromkeys} 1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}

The total time our script runs now has not improved, fromkeys is actually slower by 0.270 seconds. (Note: the reason why fromkeys is slower is caused by the fact, that it has to iterate over all the possible values to create our result dictionary, while in our previous function we used the knowledge that all values are from the range 0-99).

Finally we can try an implementation, where all the logic is built-in into Python. For such cases we have no way of improving the performance and the output of profiling for some more complex built-in methods or methods from packages can be rather difficult. In our case the Python collections package contains a functionality Counter , which does exactly what we want. The code simplifies to a single line:

Counter(data).items()

We only pass in the list and the rest is processed by the Counter method. The few first lines of the profiling:

ncalls tottime percall cumtime percall filename:lineno(function) 1 3.606 3.606 4.522 4.522 collections.py:528(update) 10000000 0.916 0.000 0.916 0.000 {method 'get' of 'dict' objects} 1 0.005 0.005 0.008 0.008 collections.py:11(<module>) 1 0.004 0.004 0.004 0.004 heapq.py:31(<module>) 1 0.001 0.001 4.531 4.531 performance_optimization.py:33(calculate_occurences_using_counter) 2/1 0.000 0.000 0.000 0.000 abc.py:148(__subclasscheck__) 1 0.000 0.000 4.522 4.522 collections.py:458(__init__) 1 0.000 0.000 0.000 0.000 collections.py:38(OrderedDict) 4 0.000 0.000 0.000 0.000 _weakrefset.py:58(__iter__)

The cumulative time is the worst out of all attempts with the script running 4.531 seconds. We don’t know how collections are internally implemented and optimizing them makes no sense, we just found out that it is not a good alternative to our current solution. However we can learn a bit more about profiling from the output. We see that the update method within collections repeatedly calls the get method of dictionary objects. The number of calls is the number of elements in the list we generated and while each call takes very little time (represented by percall) the total time is already significant.

Summary and recommendations

We tried a couple of ways to approach a very simple problem to improve the performance of our scripts and look at the output of cProfile. The examples were random and should not be used as any indication of which methods are generally preferred. They should help you to know, how you can find out about basic number of calls and execution times to guide you to problematic/ineffective parts of your code that can later be optimised. For further details about the profiler have a look at the documentation.

Things to keep in mind:

performance depends largely on the amount of data we are working with so make sure to profile on an amount of data that is comparable to your real problem

profiling your script takes some time, so without the profiling running your script will be slightly faster (this concerns us usually when we are working with times in milliseconds)

performance of the script can be influenced by other operations running on your system, so if you are comparing alternatives make sure to run them under similar conditions (to time very short code snippets repeatedly you can use timeit)

your bottleneck can be a memory restriction, which causes your script to run slowly (for such cases you can try the python memory_profiler package