We’re excited to release FlameScope: a new performance visualization tool for analyzing variance, perturbations, single-threaded execution, application startup, and other time-based issues. It has been created by the Netflix cloud performance engineering team and just released as open source, and we welcome help from others to develop the project further. (If it especially interests you, you might be interested in joining Netflix to work on it and other projects.)

FlameScope combines a subsecond-offset heatmap for navigating a profile with flame graphs. This profile can be of CPU samples or other events. Since it’s visual, it’s best demonstrated by the following one minute video:

There is also a longer video of examples here.

Subsecond-Offset Heat Maps

If you’re familiar with flame graphs, you’ll know they show an entire profile at once, which can span one minute. That’s good for analyzing steady workloads, but often there are small perturbations or variation during that minute that you want to know about, which become a needle-in-a-haystack search when shown with the full profile. FlameScope solves this by starting with a subsecond-offset heat map to visualize these perturbations, then lets you select them for study with a flame graph. In other words, you can select an arbitrary continuous time-slice of the captured profile, and visualize it as a flame graph.

You might not be familiar with subsecond-offset heat maps. They work as shown in figure 1, which has a mock ten row heat map, where:

x-axis : the passage of time, where each column represents one second

: the passage of time, where each column represents one second y-axis : this is also time, showing the fraction within the second: its subsecond offset

: this is also time, showing the fraction within the second: its subsecond offset color: shows how many events or samples that fell in that time range: darker for more

Imagine you have an event timestamp of 11.25 seconds. The x coordinate will be the 11th column, and the y coordinate will be the row that’s one quarter from the bottom. The more events that occurred around 11.25 seconds, the darker that block will be drawn.

Example

Here’s an example, with annotations showing the steps for selecting a range:

Figure 2. FlameScope selecting a time range

There’s a number of interesting things from this production CPU profile. The CPUs are busier between 0 and 5 seconds, shown as darker colors. Around the 34 and 94 second mark (sounds like a 60 second periodic task), the CPUs also become busier, but for a shorter duration. And there are occasional bursts of heavy CPU activity for about 80 milliseconds, shown as short dark red stripes.

All of these details can be selected in FlameScope, which will then draw a flame graph just for that range. Here’s one of the short red stripes:

Figure 3. Flame graph for a selected time range

Ah, that’s Java garbage collection.

Instructions

Getting started instructions are listed (and will be updated) on the github repository here. The quickest way to get started is:



$ cd flamescope

$ pip install -r requirements.txt

$ python run.py $ git clone https://github.com/Netflix/flamescope $ cd flamescope$ pip install -r requirements.txt$ python run.py

FlameScope comes with a sample profile to browse (where application code has been redacted with ‘x’ characters). Here’s how to create new profiles on Linux, which can be added to the examples directory of FlameScope for browsing:

$ sudo perf record -F 49 -a -g -- sleep 120

$ sudo perf script --header > stacks.myproductionapp.2018_03_30_01

$ gzip stacks.myproductionapp.2018_03_30_01 # optional

That example shows a two minute CPU profile, sampling at 49 Hertz on all CPUs. Any perf output with stack traces can be browsed with FlameScope, including tracing events such as block I/O, context switches, page faults, etc. Since the profile output can get large, it can also be compressed with gzip (flamescope can read .gz).

Why sample at 49 Hertz? Because 50 Hertz may sample in lock-step with a timed activity, and over- or under-count. Why roughly 50 Hertz in the first place? It’s not too slow and not too fast. Too slow and we don’t have enough samples to paint across FlameScope’s 50 row heatmap (the row count can be changed). Too fast and the overhead of sampling can slow down the application.

Runtimes like Java can require extra steps to profile using perf correctly, which have been documented in the past for generating flame graphs (including here). Since you may have already been running these steps, you might have a library of old profiles (perf script output) that you can now explore using FlameScope.

Screenshots

Since FlameScope reads Linux perf profiles, I already have a collection from prior investigations. Here are some screenshots, showing variation that I did not know about at the time.