A software application runs in the computer’s primary memory which we call Random Access Memory(RAM). JavaScript especially Node.js(Server-side js) allows us to write small to mega-sized software projects for end users. Dealing with a program’s memory is always a tricky one because a lousy implementation can block all other applications running on the given server or system. C and C++ programmers do take care of memory management because of devilish memory leaks those lurk in every corner of the code. However, js developers? Are you bothering it?

Since js developers usually do web server programming on a dedicated server with high capacity, they may not feel the lag in multitasking. Even in the case of web server development, we do run multiple applications like database server(MySQL), cache server(Redis) and much other software required by our software. We need to be aware of that they too consume the available primary memory. If we write applications casually, it can degrade the performance of other processes or completely deny them the memory allocation. In this article, we see Node JS constructs like streams, buffers, and piping by solving a problem and understand how they allow writing memory-efficient applications

We use Node.js v8.12.0 to run the programs. All the code samples we are going to present are available here.

Problem: Huge file copy

If anyone is asked to write a file copy program in Node.js, they quickly jump and create this one.

This program basically creates handles for reading a file and write a file with the given file names and try to write data into write handle after reading. It works on small files.

Let us say our application copies a huge file (> 4GB) as part of a backup process. I have an Ultra HD 4K movie file of 7.4 GB size. If I try to run above program to copy this big file from my current directory to Documents.

$ node basic_copy.js cartoonMovie.mkv ~/Documents/bigMovie.mkv

I get this nice buffer error on Ubuntu(Linux).

/home/shobarani/Workspace/basic_copy.js:7

if (err) throw err;

^ RangeError: File size is greater than possible Buffer: 0x7fffffff bytes

at FSReqWrap.readFileAfterStat [as oncomplete] (fs.js:453:11)

As you see the read operation fails because Node JS only allows you to read 2GB data into its buffer and no more. How to overcome that. When you are doing I/O intensive operations (Copy, Process, Zip), it is better to consider system memory.

Streams and Buffers in Node JS

To overcome the above problem, we need a mechanism of breaking the large data into multiple chunks, a data structure to hold those chunks. A buffer is a data structure which stores binary data. Next, we need a way to read/write chunks systematically. Streams provide that functionality.

Buffers

We can easily create a buffer by initializing the Buffer object.

let buffer = new Buffer(10); # 10 is size of buffer

console.log(buffer); # prints <Buffer 00 00 00 00 00 00 00 00 00 00>

In newer versions of Node.js (>8), you can also do this.

let buffer = new Buffer.alloc(10);

console.log(buffer); # prints <Buffer 00 00 00 00 00 00 00 00 00 00>

If we have some data already like arrays or any collections, we can create a buffer out of it using this.

let name = 'Node JS DEV';

let buffer = Buffer.from(name);

console.log(buffer) # prints <Buffer 4e 6f 64 65 20 4a 53 20 44 45 5>

Buffers have few important methods like buffer.toString() and buffer.toJSON() to look into the data stored in them.

We don’t create raw buffers in our journey to optimize code. Node JS and V8 Engine does that by creating internal buffers(queues) while working with streams or network sockets.

Streams

In simple terms, a stream is like a sci-fi portal on a Node JS object. In computer networking, ingress is an incoming action and egress is outgoing. We use these terms hereafter.

There are four types of streams available:

Readable streams (you can read data from it)

Writable streams (you can feed data into it)

Duplex streams (It is open to both read and write)

Transform streams (A custom duplex stream for processing data(compressing, validity check) that is ingress/egress for it)

This single line can tell precisely why one should use streams.

A vital goal of the stream API, particularly the stream.pipe() method, is to limit the buffering of data to acceptable levels such that sources and destinations of differing speeds don’t choke the available memory.

You need some way to do the operation without overwhelming the system. That is what we talked in the initial sentences of this article.

Courtesy: Node JS Docs

In the above diagram, we have two types of streams. Readable and Writable. The .pipe() method is a very basic primitive for attaching a readable stream to a writable stream. If you don’t understand the above diagram, it is fine. After seeing our examples, you can come back here, and everything makes sense to you. Piping is a compelling mechanism and below we illustrate it with two examples.

Solution 1 (Naive file copy with streams)

Let us devise a solution to overcome the huge file copy problem that we discussed earlier. To make it possible we can create two streams and implement this procedure.

Listen for data chunk on Readable Stream Write that chunk on Writable stream Track the copy operation progress

Let us name the program as streams_copy_basic.js

Streams without piping

In this program, we are asking the user to input two files (source and destination) and created two streams to copy the chunks from readable source to writable destination. We declared few more variables to keep track of progress and printed it to the standard output(console here). We subscribed to few events like:

‘data’: invokes when a data chunk is read

‘end’: invokes when reading chunks from a readable stream are finished

‘error’: invokes if there are any problems in the reading process

Run this program, and we can successfully copy a big file (7.4 GB in my case)

$ time node streams_copy_basic.js cartoonMovie.mkv ~/Documents/4kdemo.mkv

However, there is a problem. Observe the memory used by the Node.js process in the activity/process monitor on your machine.