1. Update To The Latest Version of Node.js

You can improve the performance easily by upgrading the Node.js version simply, because almost any newer version for Node.js will be better than the older version.

The performance improvement of each version for Node.js mainly comes from the two aspects:

The version update for V8;

The update optimization for the Node.js internal code.

For example, in the latest V8 7.1, it optimizes the escape analysis of closures for some cases and improves the performance of some methods of Array:

The internal code of Node.js has been significantly optimized as the version is upgraded. For example, the following figure shows that the performance of the require changes as the Node.js version upgrades:

When reviewing, each PR submitted to Node.js will be considered whether it will degrade the current performance. There is also a dedicated benchmarking team to monitor performance changes, and you can check performance changes for each version of Node.js here: https://benchmarking.nodejs.org/

So, You don't have to worry about the performance of the new version for node.js at all, and if you find any performance degradation under the new version, you are welcome to submit an issue.

How to choose the version of Node.js?

Here are the version strategies for Node.js:

The versions of Node.js are mainly divided into Current and LTS;

Current means the current version of Node.js, which is still under development;

LTS means a stable, long-term maintenance version;

Node.js will release a major version upgrade every six months (in April and October of each year), and the major version will bring some incompatible upgrades;

The version released in April each year (the version number will be even, such as v10) is a LTS version, which is a long-term supported version. The community will continue to maintain 18 + 12 months (Active LTS + Maintaince LTS) from the October of the year of release;

The version released in October each year (the version number will be odd, such as the current v11) has only an 8-month maintenance period.

For example, now (November 2018), the version of Node.js Current is v11, and the LTS versions are v10 and v8. The older v6 is at Maintenace LTS and will be no longer maintained from April next year. The v9 version released last October ended of maintenance in June this year.

Release Status Codename Initial Release Active LTS Start Maintenance LTS Start End-of-life 6.x Maintenance LTS Boron 2016-04-26 2016-10-18 2018-04-30 Aprial 2019 8.x Active LTS Carbon 2017-05-30 2017-10-31 April 2019 December 2019 10.x Active LTS Dubnium 2018-04-24 2018-10-30 April 2020 April 2021 11.x Current Release 2018-10-23 June 2019 12.x Pending 2019-04-23 October 2019 April 2021 April 2022

For production environments, Node.js officially recommends the latest LTS version.

2. Use fast-json-stringify To Speed Up JSON Serialization

In JavaScript, it's very convenient to generate JSON strings:

const json = JSON.stringify(obj)

But few people know that the performance also can be optimized here, which is to use JSON Schema to speed up serialization.

We need to identify a large number of field types for JSON serialization. For example, we need to add " on both sides for the string types; and for the array types, we need to traverse the array, separate each object with , after serializing the objects, and then add [ ] on each side.

But if you have known the type of each field through Schema in advance, you don't need to traverse and identify the field type, for you can serialize the corresponding field directly, which greatly reduces the computational overhead. And this is the principle of fast-json-stringfy.

According to the results in the project, it can even be 10 times faster than JSON.stringify in some cases!

Here is a simple example:

const fastJson = require('fast-json-stringify') const stringify = fastJson({ title: 'Example Schema', type: 'object', properties: { name: { type: 'string' }, age: { type: 'integer' }, books: { type: 'array', items: { type: 'string', uniqueItems: true } } } }) console.log(stringify({ name: 'Starkwang', age: 23, books: ['C++ Primier', 'John Alex'] })) //=> {"name":"Starkwang","age":23,"books":["C++ Primier","John Alex"]}

There is usually a lot of data to use JSON in the middleware business of Node.js, and the structures of JSON are very similar (and it is more so if you use TypeScript). It's very suitable to use JSON Schema for optimization in these scenarios.

3. Improve The Performance of Promise

Promise is the panacea for solving callback nested hells. Especially since the full popularity of async/await, they together with Promise have become the ultimate solution for JavaScript asynchronous programming, and a large number of projects now begin using this kind of mode.

However there is performance cost hidden behind the elegant syntax. We can use the existing benchmarking project on github to test. And the following are the test results:

file time(ms) memory(MB) callbacks-baseline.js 380 70.83 promises-bluebird.js 554 97.23 promises-bluebird-generator.js 585 97.05 async-bluebird.js 593 105.43 promises-es2015-util.promisify.js 1203 219.04 promises-es2015-native.js 1257 227.03 async-es2017-native.js 1312 231.08 async-es2017-util.promisify.js 1550 228.74 Platform info: Darwin 18.0.0 x64 Node.JS 11.1.0 V8 7.0.276.32-node.7 Intel(R) Core(TM) i5-5257U CPU @ 2.70GHz × 4

We can see from the results that the native async/await + Promise perform much worse than the callback and has a much higher memory footprint. The performance overhead here can't be ignored for middleware projects with a lot of asynchronous logic.

It also can be found that the performance loss mainly comes from the implementation of the Promise object itself. The Promise implemented natively in V8 is much slower than the Promise library (such as bluebird) implemented by a third-party. And the syntax for async/await does not cause too much performance loss.

So for the lightweight computing middleware projects that has a large number of asynchronous logic, you can change the global Promise to the implementation of the bluebird in the code:

global.Promise = require('bluebird');

4. Write The Asynchronous Code Correctly

The asynchronous code will look pretty after using async/await :

const foo = await doSomethingAsync(); const bar = await doSomethingElseAsync();

But sometimes, we may forget the other capabilities that Promise brings to us, such as the parallel ability of Promise.all() :

// bad async function getUserInfo(id) { const profile = await getUserProfile(id); const repo = await getUserRepo(id) return { profile, repo } } // good async function getUserInfo(id) { const [profile, repo] = await Promise.all([ getUserProfile(id), getUserRepo(id) ]) return { profile, repo } }

And Promise.any() (this method is not in the ES6 Promise standard, you can also use the standard Promise.race() instead) can be used to achieve more reliable and faster calls easily:

async function getServiceIP(name) { // Get the service IPs from DNS and ZooKeeper, and use the one which returns successfully first. // Unlike Promise.race, an error is thrown only when both calls are rejected. return await Promise.any([ getIPFromDNS(name), getIPFromZooKeeper(name) ]) }

5. Optimize V8 GC

There are many similar articles about V8's garbage collection mechanism, so I won't repeat it here.

There are several pitfalls when we are developing code:

Pitfall 1: Using the large object as a cache, resulting in slower garbage collection in Old Space.

Example:

const cache = {} async function getUserInfo(id) { if (!cache[id]) { cache[id] = await getUserInfoFromDatabase(id) } return cache[id] }

Here we use a variable cache as a cache to speed up the query of user information. The cache object will enter the old space after multiple queries, and it will become very large. As the old space uses the way of tri-colour marking + DFS for GC, a large object will increase the time directly spent on GC (and there is also a risk of memory leaks).

Solution:

Use an external cache like Redis. In fact the in-memory database like Redis is great for this kind of scene;

Limit the size of local cache objects. Use a mechanism such as FIFO or TTL to clean up the cache in the object.

Pitfall 2: The insufficient for Young Space results in GC frequently.

Node.js allocates 64MB (64-bit machine) of memory to the Young Generation by default. However, since the Young Generation GC uses Scavenge algorithm, only half of the memory can be used actually, which means 32MB.

When the business code generates a large number of small objects frequently, the space will be filled up easily, which triggers GC then. Although the Young Generation GC is much faster than the Old Generation GC, frequent GCs still has a significant impact on performance. In an extreme case, GC can even take up about 30% of the total computation time.

The solution is to modify the upper limit of the memory for the Young Generation and reduce the number of GCs when starting Node.js:

node --max-semi-space-size=128 app.js

Well you may ask, is the memory for the Young Generation the bigger the better?

The number of GCs decreases as the memory increases, but the time required for each GC increases as well. Therefore, the larger memory is not the better.

Generally speaking, it is reasonable to allocate 64MB or 128MB for the Yong Generation.

6. Use Stream Correctly

Stream is one of the most basic concepts in Node.js. Most of the IO-related modules inside Node.js, such as http, net, fs, and repl, are built on various Streams.

The classic example below should be known to most developers. For large files, we don't need to read it into memory completely, but use Stream to stream it out:

const http = require('http'); const fs = require('fs'); // bad http.createServer(function (req, res) { fs.readFile(__dirname + '/data.txt', function (err, data) { res.end(data); }); }); // good http.createServer(function (req, res) { const stream = fs.createReadStream(__dirname + '/data.txt'); stream.pipe(res); });

Proper use of Stream in business code can greatly improve performance. of course, we are likely to ignore it in the actual business. For example, we can use renderToNodeStream for the project rendered with React server side:

const ReactDOMServer require('react-dom/server') const http = require('http') const fs = require('fs') const app = require('./app') // bad const server = http.createServer((req, res) => { const body = ReactDOMServer.renderToString(app) res.end(body) }); // good const server = http.createServer(function (req, res) { const stream = ReactDOMServer.renderToNodeStream(app) stream.pipe(res) }) server.listen(8000)

Use pipeline to manage stream

In the past Node.js, dealing with stream was very cumbersome. For example:

source.pipe(a).pipe(b).pipe(c).pipe(dest)

Once any one of the source, a, b, c, and dest errors or closes, the entire pipeline will stop. Then we need to destroy all the streams manually, which is very troublesome at the code level.

So libraries like pump comes into the community to control the destruction of stream automatically. There is a new feature in Node.js v10.0: stream.pipeline, which can replace pump to help us manage stream better.

An official example:

const { pipeline } = require('stream'); const fs = require('fs'); const zlib = require('zlib'); pipeline( fs.createReadStream('archive.tar'), zlib.createGzip(), fs.createWriteStream('archive.tar.gz'), (err) => { if (err) { console.error('Pipeline failed', err); } else { console.log('Pipeline succeeded'); } } );

Implement your own high-performance Stream

You may also need to implement your own Stream in the business. You can refer to the documentations:

Although the Stream is amazing, there are hidden performance issues when you're implementing Stream by yourself. For example:

class MyReadable extends Readable { _read(size) { while (null !== (chunk = getNextChunk())) { this.push(chunk); } } }

When we're calling new MyReadable().pipe(xxx) , the chunks obtained through getNextChunk() will be pushed out until the end of reading. However, if the next processing speed of the pipeline is slower, the data will accumulate in the memory, causing memory usage to increase and GC speed to decrease.

The approach is to select the proper behavior on the basis of the returned value of this.push() . When the returned value is false , which means the stacked chunks have been full now, then the reading should be stopped.

class MyReadable extends Readable { _read(size) { while (null !== (chunk = getNextChunk())) { if (!this.push(chunk)) { return false } } } }

This issue has been described in an official Node.js article: Backpressuring in Streams

7. Is The C++ Extension Faster Than JavaScript?

Node.js is great for the IO-intensive applications, and for the compute-intensive business, many people think of optimizing the performance by the way of writing C++ Addon. But in fact C++ extensions are not panaceas, and the performance of V8 is not as bad as you might think.

For example, I migrated Node.js' net.isIPv6() from C++ to the implemention of JS in September this year, and then most test cases get performance improvements ranging from 10% to 250% (Check here for PR).

JavaScript runs faster on V8 than C++ extensions. This happens mostly in the scenes related to strings and regular expressions, because the regular expression engine used inside V8 is irregexp, and this regular expression engine is much faster than the engine ( boost::regex )that comes with boost.

It's also worth noting that Node.js' C++ extensions can consume a lot of performance when doing type conversions. Thus the performance may be greatly degraded if you don't pay attention to the C++ code.

Here is another article comparing the performance of C++ and JS under the same algorithm: How to get a performance boost using Node.js native addons. The noteworthy conclusion is that the performance is not even as good as half of JS implementation after the C++ code converts the strings that are in the arguments (convert String::Utf8Valu to std::string ). Higher performance than JS is achieved only after using the type encapsulation provided by NAN .

In some cases, C++ extensions are not necessarily more efficient than native JavaScript. If you are not so confident about C++, it's recommended to use JavaScript, because the performance of V8 is much better than you think.

8. Use node-clinic To Locate Performance Issues Quickly

Is there anything that can be used out of the box? Of course there is.

Node-clinic is a Node.js performance diagnostic tool which is open sourced by NearForm and can be used to locate performance issues quickly.

npm i -g clinic npm i -g autocannon

You first need to start the service process:

clinic doctor -- node server.js

Then we can use any load-testing tool to run a load test, such as autocannon which is from the same creator (Of course you can also use ab, curl or other tools to perform the load testing.):

autocannon http://localhost:3000

After the load testing is completed, we close the process started by the clinic by pressing ctrl + c, and the report will be generated automatically. For example, here is the performance report for one of our middleware services:

What we can see from the CPU usage curve is that the performance bottleneck of the middleware service is not its own internal compution, but the slownees of I/O. The clinic also tells us that there are potential I/O problems detected.

Let's use clinic bubbleprof to detect I/O problems:

clinic bubbleprof -- node server.js

After another load testing, we get a new report:

What we can see from the report is that http.Server is at the pending state for 96% of the time through out the whole period of running. If we check for the details, we will find that there are a lot of empty frames in the call stack. Due to the limitations of network I/O, CPU has a lot of idling, which is very common in the middleware business. It also indicates that direction for optimization is not in the service, but in the speed of the server's gateway and dependent services.

Check here to find out how to read the report generated by clinic bubbleprof : https://clinicjs.org/bubblepr...

Similarly, the clinic can also detect out the computational performance problems inside the service. Let's do something to make the performance bottleneck of the service appear in the CPU computation.

Let's add the destructive code for idling 100 million times to some middleware, and it is very CPU intensive:

function sleep() { let n = 0 while (n++ < 10e7) { empty() } } function empty() { } module.exports = (ctx, next) => { sleep() // ...... return next() }

Then use clinic doctor and repeat the above steps to generate another performance report:

This is a very typical example of synchronous computation's blocking asynchronous queues. There is much computation performed on the main thread, which causes the asynchronous callback of JavaScript to fail to be triggered in time, and the delay of the Event Loop is extremely high.

For such applications, we can continue to use the clinic flame to determine exactly where intensive computation has occurred:

clinic flame -- node app.js

After the load testing, we get the flame graph (here the number of idlings was reduced to 1 million times, so that the flame image does not look so extreme):

There is a big white bar at the top, which represents the CPU time consumed by idling of the sleep function. Based on such flame graphs, we can very easily see the consumption for CPU resources, so as to locate the intensive computation in the code and find performance bottlenecks.