Flamegraph generated for a Rails application’s request

Flamegraphs can be a valuable tool in helping you debug your application’s performance woes. But they can be pretty overwhelming when you’re using it for the first time, especially when you’re dealing with complex Rails applications with a lot of things going on inside of it.

But all we need is a little bit of intuition on how a flamegraph is built and what it represents and we’re good to go on our performance optimization journey. This post is aimed at helping you gain that valuable intuition.

What are flamegraphs?

A flamegraph is simply a visualization of the output of a sampling profiler.

Sampling Profiler? Alright, lets break things down. Firstly, a profiler is a tool that lets you inspect a running program and collect statistics that tell you how much resources (think memory, disk, etc.) the program is using, how long method calls are taking, how much time is spent executing each part of the code, etc. These stats are useful in helping you optimize your program. There are many different kinds of profilers and an indepth look at them is well beyond the scope of this article.

For the purpose of better understanding flamegraphs, let’s take a closer look at sampling profilers.

Sampling Profilers

A sampling profiler is a kind of profiler, that works by taking snapshots of the running program’s method call stack at regular intervals and reporting where (in which functions or methods) most of the execution time is spent — thereby allowing us to identify the slower parts (bottlenecks) in our program.

Understanding how a sampling profiler works is crucial to gaining the intuition to read a flamegraph. So let me use some source code to show how a sampling profiler does its job.

The code in runner.rb simply calls methods in two classes — HeavyWorker and LightWorker — which both do some work that take different times to complete.

Like I described before, a sampling profiler takes snapshots (samples, in other words) of the program call stack at a rate defined by our sampling frequency.

Assuming our program takes about 6 seconds to run (which is not actually true) and we profile the program with a sampling frequency of 1 second these would be the samples we’d get —

Each row represents a frame in the call stack. Using our intuition on what call stacks represent, we can now deduce the following facts looking at the samples we obtained —

HeavyWorker#work takes up about 2/3rds (~66%) of the total execution time. Execution time in HeavyWorker#work is split evenly between HeavyWorker#work (~33%) and HeavyWorker#more_work (~33%). 1/3rd (~33%) of the total execution time is spent in LightWorker#work.

Now that is some useful information. Just imagine the things you can do if you can get such snapshots out of a Rails request execution. We could tell what percent of the response time was spent in the model or how long it took to render your templates or if there are any offending libraries that are taking too much time to execute.

Using a real sampling profiler like stackprof (a popular option for Ruby 2.1+) on our program would provide us with a report like this —

Text report generated by running stackprof

This report tells you how many times each method (frame) was seen on top of the call stack snapshot (which means at that point in time, that method was being executed).

Inline with the deductions we made above, this report tells us HeavyWorker takes about 66% of the execution time and LightWorker takes about 33%. An important thing to note is that we need to be careful not to correlate this with real execution times. That would not be accurate and sampling profilers are not meant to track actual execution times of each method.

stackprof is pretty sophisticated and there are other reports that you can generate using it as well. Do read their docs to know more.

Back to flamegraphs

Now that we’re clear on how a sampling profiler could be used to extract samples from a running program, we can see how these samples can be used to generate a flamegraph.

All one needs to do is, combine the snapshots we got above in a single horizontal line (which represents the time progression). We could get something like this —

I’ve color-coded the frames based on which class they belong to. Now just by glancing at this you can tell what proportion of the execution time is spent in each class and method. Pretty neat right? I hope you’ve a gained a better intuition on how to read flamegraphs by now.

Lets go ahead and use Sam Saffron’s flamegraph gem to generate a real flamegraph for our program. Here’s how you’d use it —

Flamegraph.generate('flamegraph.html') { # your code }

And here’s what we get —

Flamegraph for our example code. Has been tweak little to cut out noise (frames from the Flamegraph and stackprof libraries) and frames have been given clearer names.

The duplicate frames stacked on top of each other represent the blocks we’re used in our program here. Each Ruby block is given a separate frame on the call stack.

Now that you’ve learned how to read flamegraphs, you should be able to dig in and track down the parts of your code base that are slowing down your application and perform some optimizations for fun and profit!

Using with Rails

There are many ways to generate flamegraphs for Rails applications. I would suggest you go with the rack-mini-profiler gem. This gem bundles together various tools that let you perform database, call-stack and memory profiling.

Once you’ve hooked rack-mini-profiler up with your Rails application, you can begin profiling your requests and visualizing the results as flamegraphs with ridiculous ease.

What’s next? — Flamegraphs generated for Rails requests can be quite complex and noisy because of the immense complexity of the Rails framework. I highly recommend you read Justin Weiss’s blog on the topic to help improve your mileage when debugging Rails applications using flamegraphs.