UPDATE: now on github with a clickable example! You can even use the on-line viewer for your own traces.

At work, I’ve been trying to profile a very CPU-intensive part of a long-running Python server.

There was a time, back in the dark ages, when randomly stopping your program in a debugger and memorising the stack trace was a perfectly respectable way to ‘profile’ your code!

And it actually worked very well. But somehow, with the advent of fancy profiling tools, the simple and robust approach became forgotten…

In order to profile my servers, I’ve added the yappi profiler in such a way that it can be started and stopped via a debug HTTP interface. So you can, remotely, turn on and off profiling and recover the log.

This has worked better than nothing, but I’ve found it problematic; yappi - and other Python profilers - are so intrusive that they can slow the whole server down by nearly a factor of 3, and this can cause a whole cascade of timeouts and other problems which interfere with the path you are trying to profile. Additionally, interpreting the output can be difficult. The functions that glow most are the utility functions, and the actual what-your-application-is-doing feel just isn’t there.

So, in desperation, this morning I made a full-stack-sampling profiler. N-times a second it will save a snapshot of the current stack. Sampling the stack, say, 10 times per second has no noticeable impact on run-time and doesn’t distort the program being profiled:

Why do I want the raw full stack-traces? Because these are key to understanding what your program is doing! Stopping a program randomly in a debugger a few times is a very good way of discovering your bottlenecks, and my script just automates it.

I had a very clear idea of how I wanted to see the output. And the output is JSON. So, rather than making a little tkinter viewer app, I just threw together a rough Javascript page that you can upload your JSON output to and it renders it on a clickable canvas.

The viewer creates a horizontally-scrollable time-line. It shows how deep the stack is at each sample point. Clicking on a sample shows the call stack at that slice of the action. And here’s the neat thing - the viewer highlights in all other stack samples on the time-line the extent of their common base.

Look at that pretty screenshot above again ↑

So I’ve clicked on a sample 11 seconds into the trace, and can see the stack-trace sample at that time. The red in the other samples are the calls that are common. At the very bottom are two calls that are part of the twisted IO loop, and so that can be tuned out. But contrast the block of blue to the left; that shows that all those previous calls were doing something quite different; and the big block of red to the right show that subsequent samples were very much in the same area of code.

This is a really great way to understand your calls!

There could be more information captured and presented; the user and system times, for example, and the stack traces of other threads and so on. I chose to not present that in my graph simply because they were not interesting to my cause.

You can easily extend will_profile.py to capture all stacks; for example:

id2name = dict([(th.ident, th.name) for th in threading.enumerate()])

for ident,frame in sys._current_frames().items():

thread_name = id2name.get(ident,"")

stack = traceback.extract_stack(frame)

...

I hope my quick scripts are useful to you when you have to debug a long-running, CPU-intensive task in the future!

Notes

"share"