Understanding the performance profile of your Node.js applications can be a difficult process, particularly while dealing with an ever-changing landscape in an evolving ecosystem.

In this post, I am going to explore looking at Node.js application performance using tools that are compatible with other programming languages, to provide a cross-language profiling toolset, that can be used on Linux systems with a recent Kernel.

We will be using a bunch of tools that are part of the Linux Kernel but don’t let that scare you, they are pretty simple once you understand the basics.

The first tool of the bunch is found in the “linux-tools” package and is suitably named perf , it can also be referred to as perf_events because poor SEO of perf and that the supporting documentation from Brendan Gregg and Vince Weaver uses perf_events . In our case perf_events samples the Linux Kernel to provide a statistically relevant set of data of the runtime of your application without capturing all data points, that would either be too much or even more importantly slow down your application/machine.

How to collect performance samples

With perf_events available to use, we will now start to capture some data whilst our application is under load.

I am going to be using next.js to build a sample server-side rendered react application, which we will be using to show what these performance profiling tools will uncover.



cd hello-next-performance

npm install

npm run build

NODE_OPTIONS="--perf-basic-prof --perf-prof-unwinding-info" npm run start git clone git@github.com :tomgco/hello-next-performance.gitcd hello-next-performancenpm installnpm run buildNODE_OPTIONS="--perf-basic-prof --perf-prof-unwinding-info" npm run start # Ready on http://localhost:3000

(Note this might not work for you yet, as I have a PR in progress on the Node.js project to allow those parameters. PR #25565)

Once this application is up and running, you should be able to access it on http://localhost:3000. To produce some load on this application we will use an application named artillery which is a fantastic tool writing user-journey based load tests.

artillery quick -r 200 -d 0 http://localhost:3000

So, we now have our application running, and a sustained amount of load (150 arrivals per second), so the next step on our performance journey is to use perf_events to get an understanding of what is happening to it.

sudo perf record -F 99 -g -p $(pgrep -f next-start) -- sleep 60

Let’s deconstruct the perf command:

sudo perf record — Record the perf events, ran as sudo as we need to interface with the Linux Kernel.

— Record the perf events, ran as sudo as we need to interface with the Linux Kernel. -F 99 — Sample the application at this frequency, 99 instead of 100 to minimise lockstep sampling

— Sample the application at this frequency, 99 instead of 100 to minimise lockstep sampling -g — Enables call-graph sampling, we will use this to make it easier to parse and make pretty graphs!

— Enables call-graph sampling, we will use this to make it easier to parse and make pretty graphs! -p $(pgrep -f next-start) — Find our node.js applications process id.

— Find our node.js applications process id. -- sleep 60 Run our record for 60 seconds

After our command has ran for 60 seconds, we should see some output in our terminal telling us that the perf command has captured data.

[ perf record: Woken up 1 times to write data ]

[ perf record: Captured and wrote 0.249 MB perf.data (742 samples) ]

perf_events should have now created a perf.data file, and the node.js application should’ve created some logs files in the format of isolate-0x*-v8.log as well as some files /tmp/perf-*.map . To get a human readable version of the perf.data file, we can now run another command which will output to a file of our choice.

sudo perf script -f --header > stacks.test.$(date --iso-8601)

Now we have our raw data, lets use some tools to inspect what we have!

FlameScope

The first tool which we are looking at is something developed by Netflix which “uses subsecond offset heat maps and flame graphs to analyze periodicactivity, variance, and perturbations.” I found out about this tool on Brendan Gregg’s blog post titled “FlameScope Pattern Recognition”.

Flamescope can be installed by following the instructions found on the Github repository: https://github.com/Netflix/flamescope once this is running, it will load any files found within the examples directory within the flamescope folder. We can view what we have been running by copying our stacks.test.xxxx files to the examples directory and investigate it at http://localhost:5000

Our example applications showing an interesting load pattern

What we can see in the image to our left, is that although the arrivals per second are constant, we still see some variability in load every 5 seconds.

What could cause this?

To further investigate, we can drill down into a flamegraph, which should show us what functions are taking up the most time. This is achieved by selecting a range within both the light coloured sections and the darker colour sections.

What becomes apparent is that in the darker sections we see that our applications is spending much longer in the “sha512_block_data_order_avx2” and the “node::crypto::PBKDF2Job::DoThreadPoolWork” sections. These functions are used by crypto.pbkdf2 to generate hashed passwords. https://github.com/tomgco/hello-next-performance/blob/master/pages/index.js#L20 Looking in deeper we can see that this application actually changes the strength of the password hash every 5 seconds (why, who knows! :D), explaining the pattern we see in the subsecond heatmap.