Gergely Nemeth Co-Founder of RisingStack, EM at Uber

Post-mortem diagnostics & debugging comes into the picture when you want to figure out what went wrong with your Node.js application in production.

In this chapter of Node.js at Scale we will take a look at node-report, a core project which aims to help you to do post-mortem diagnostics & debugging.

The node-report diagnostics module

The purpose of the module is to produce a human-readable diagnostics summary file. It is meant to be used in both development and production environments.

The generated report includes:

JavaScript and native stack traces,

heap statistics,

system information,

resource usage,

loaded libraries.

Currently node-report supports Node.js v4, v6, and v7 on AIX, Linux, MacOS, SmartOS, and Windows.

Adding it to your project just takes an npm install and require:

npm install node-report --save

//index.js require('node-report')

Once you add node-report to your application, it will automatically listen on unhandled exceptions and fatal error events, and will trigger a report generation. Report generation can also be triggered by sending a USR2 signal to the Node.js process.

Use cases of node-report

Diagnostics of exceptions

For the sake of simplicity, imagine you have the following endpoint in one of your applications:

function myListener(request, response) { switch (request.url) { case '/exception': throw new Error('*** exception.js: uncaught exception thrown from function myListener()'); } }

This code simply throws an exception once the /exception route handler is called. To make sure we get the diagnostics information, we have to add the node-report module to our application, as shown previously.

require('node-report') function my_listener(request, response) { switch (request.url) { case '/exception': throw new Error('*** exception.js: uncaught exception thrown from function my_listener()'); } }

Let's see what happens once the endpoint gets called! Our report just got written into a file:

Writing Node.js report to file: node-report.20170506.100759.20988.001.txt Node.js report completed

Need assistance running Node.js in production? RisingStack: Expert help when you need it the most

The header

Once you open the file, you'll get something like this:

=================== Node Report =================== Event: exception, location: "OnUncaughtException" Filename: node-report.20170506.100759.20988.001.txt Dump event time: 2017/05/06 10:07:59 Module load time: 2017/05/06 10:07:53 Process ID: 20988 Command line: node demo/exception.js Node.js version: v6.10.0 (ares: 1.10.1-DEV, http_parser: 2.7.0, icu: 58.2, modules: 48, openssl: 1.0.2k, uv: 1.9.1, v8: 5.1.281.93, zlib: 1.2.8) node-report version: 2.1.2 (built against Node.js v6.10.0, 64 bit) OS version: Darwin 16.4.0 Darwin Kernel Version 16.4.0: Thu Dec 22 22:53:21 PST 2016; root:xnu-3789.41.3~3/RELEASE_X86_64 Machine: Gergelys-MacBook-Pro.local x86_64

You can think of this part as a header for your diagnostics summary - it includes..

the main event why the report was created,

how the Node.js application was started ( node demo/exception.js ),

), what Node.js version was used,

the host operating system,

and the version of node-report itself.

The stack traces

The next part of the report includes the captured stack traces, both for JavaScript and the native part:

=================== JavaScript Stack Trace =================== Server.myListener (/Users/gergelyke/Development/risingstack/node-report/demo/exception.js:19:5) emitTwo (events.js:106:13) Server.emit (events.js:191:7) HTTPParser.parserOnIncoming [as onIncoming] (_http_server.js:546:12) HTTPParser.parserOnHeadersComplete (_http_common.js:99:23)

In the JavaScript part, you can see..

the stack trace (which function called which one with line numbers),

and where the exception occurred.

In the native part, you can see the same thing - just on a lower level, in the native code of Node.js

=================== Native Stack Trace =================== 0: [pc=0x103c0bd50] nodereport::OnUncaughtException(v8::Isolate*) [/Users/gergelyke/Development/risingstack/node-report/api.node] 1: [pc=0x10057d1c2] v8::internal::Isolate::Throw(v8::internal::Object*, v8::internal::MessageLocation*) [/Users/gergelyke/.nvm/versions/node/v6.10.0/bin/node] 2: [pc=0x100708691] v8::internal::Runtime_Throw(int, v8::internal::Object**, v8::internal::Isolate*) [/Users/gergelyke/.nvm/versions/node/v6.10.0/bin/node] 3: [pc=0x3b67f8092a7] 4: [pc=0x3b67f99ab41] 5: [pc=0x3b67f921533]

Heap and garbage collector metrics

You can see in the heap metrics how each heap space performed during the creation of the report:

new space,

old space,

code space,

map space,

large object space.

These metrics include:

memory size,

committed memory size,

capacity,

used size,

available size.

To better understand how memory handling in Node.js works, check out the following articles:

=================== JavaScript Heap and GC =================== Heap space name: new_space Memory size: 2,097,152 bytes, committed memory: 2,097,152 bytes Capacity: 1,031,680 bytes, used: 530,736 bytes, available: 500,944 bytes Heap space name: old_space Memory size: 3,100,672 bytes, committed memory: 3,100,672 bytes Capacity: 2,494,136 bytes, used: 2,492,728 bytes, available: 1,408 bytes Total heap memory size: 8,425,472 bytes Total heap committed memory: 8,425,472 bytes Total used heap memory: 4,283,264 bytes Total available heap memory: 1,489,426,608 bytes Heap memory limit: 1,501,560,832

Resource usage

The resource usage section includes metrics on..

CPU usage,

the size of the resident set size,

information on page faults,

and the file system activity.

=================== Resource usage =================== Process total resource usage: User mode CPU: 0.119704 secs Kernel mode CPU: 0.020466 secs Average CPU Consumption : 2.33617% Maximum resident set size: 21,965,570,048 bytes Page faults: 13 (I/O required) 5461 (no I/O required) Filesystem activity: 0 reads 3 writes

System information

The system information section includes..

environment variables,

resource limits (like open files, CPU time or max memory size)

and loaded libraries.

Diagnostics of fatal errors

The node-report module can also help once you have a fatal error, like your application runs out of memory.

By default, you will get an error message something like this:

<--- Last few GCs ---> 23249 ms: Mark-sweep 1380.3 (1420.7) -> 1380.3 (1435.7) MB, 695.6 / 0.0 ms [allocation failure] [scavenge might not succeed]. 24227 ms: Mark-sweep 1394.8 (1435.7) -> 1394.8 (1435.7) MB, 953.4 / 0.0 ms (+ 8.3 ms in 231 steps since start of marking, biggest step 1.2 ms) [allocation failure] [scavenge might not succeed].

On its own, this information is not that helpful. You don't know the context, or what was the state of the application. With node-report , it gets better.

First of all, in the generated post-mortem diagnostics summary you will have a more descriptive event:

Event: Allocation failed - JavaScript heap out of memory, location: "MarkCompactCollector: semi-space copy, fallback in old gen"

Secondly, you will get the native stack trace - that can help you to understand better why the allocation failed.

Diagnostics of blocking operations

Imagine you have the following loops which block your event loop. This is a performance nightmare.

var list = [] for (let i = 0; i < 10000000000; i++) { for (let j = 0; i < 1000; i++) { list.push(new MyRecord()) } for (let j=0; i < 1000; i++) { list[j].id += 1 list[j].account += 2 } for (let j = 0; i < 1000; i++) { list.pop() } }

With node-report you can request reports even when your process is busy, by sending the USR2 signal. Once you do that you will receive the stack trace, and you will see in a minute where your application spends time.

(Examples are taken for the node-report repository)

The API of node-report

Triggering report generation programmatically

The creation of the report can also be triggered using the JavaScript API. This way your report will be saved in a file, just like when it was triggered automatically.

const nodeReport = require('node-report') nodeReport.triggerReport()

Getting the report as a string

Using the JavaScript API, the report can also be retrieved as a string.

const nodeReport = require('nodereport') const report = nodeReport.getReport()

Using without automatic triggering

If you don't want to use automatic triggers (like the fatal error or the uncaught exception) you can opt-out of them by requiring the API itself - also, the file name can be specified as well:

const nodeReport = require('node-report/api') nodeReport.triggerReport('name-of-the-report')

Contribute

If you feel like making Node.js even better, please consider joining the Postmortem Diagnostics working group, where you can contribute to the module.

The Postmortem Diagnostics working group is dedicated to the support and improvement of postmortem debugging for Node.js. It seeks to elevate the role of postmortem debugging for Node, to assist in the development of techniques and tools, and to make techniques and tools known and available to Node.js users.

In the next chapter of the Node.js at Scale series, we will discuss Microservices Distributed Tracing with Node.js and OpenTracing. If you have any questions, please let me know in the comments section below.