Too often, MongoDB REST API developers don't think about handling database outages until they have an outage in production. Usually you can get away with this because version 2.x of the MongoDB Node.js driver does most of the work for you: it handles attempting to reconnect and can buffer operations for you so you don't get errors during a temporary outage. However, the MongoDB Node.js driver has a lot of tunable options and corresponding subtleties that you need to be aware of. In this article, I'll cover the basics of what happens when your backend MongoDB topology goes down for single server and replica sets, so you can configure the driver to do the right thing for your use case.

Handling Single Server Outages

The mongodb-topology-manager npm package is indispensible for testing MongoDB connection logic. This lib can stop and start MongoDB servers, replica sets, and sharded clusters from Node.js. The MongoDB driver Node.js uses it for testing internally.

First, let's see what happens when the MongoDB server that the driver is connected to shuts down. The db handle the driver gives you emits several events, including 'close' when it loses connection and 'reconnect' when it reconnects. The driver attempts to reconnect by default, but this is a tuneable option. Note that these events only get fired for a single server or mongos, not when you're connected to a replica set (more on that later). Here's a script that uses the topology manager to stop/start a MongoDB instance so you can see these events actually fire:

const { Server } = require ( 'mongodb-topology-manager' ); const mongodb = require ( 'mongodb' ); run().catch(error => console .error(error)); async function run ( ) { const server = new Server( 'mongod' , { dbpath: ` ${__dirname} /data/db/27000` , port: 27000 }); await server.start(); console .log( 'Server started...' ); const db = await mongodb.MongoClient.connect( 'mongodb://localhost:27000/test' ); db.on( 'close' , () => { console .log( '-> lost connection' ); }); db.on( 'reconnect' , () => { console .log( '-> reconnected' ); }); await server.stop(); console .log( 'stopped server' ); await new Promise (resolve => setTimeout(() => resolve(), 2000 )); await server.start(); console .log( 'restarted server' ); }

The driver events are useful for logging and alerting. You should take action and alert someone when connection is lost.

The driver has some subtleties about how it handles database operations while the driver is not connected. For example, if you run a findOne() query in the above script after the script stops the server but before it restarts the server, you'll see that query doesn't execute until after the driver reconnects.

await server.stop(); console .log( 'stopped server' ); await new Promise (resolve => setTimeout(() => resolve(), 2000 )); db.collection( 'Test' ).findOne({}, function ( error ) { console .log( 'Finished query' , error); }); await new Promise (resolve => setTimeout(() => resolve(), 2000 )); await server.start(); console .log( 'restarted server' );

This buffering behavior can be confusing, but thankfully you can turn it off with the bufferMaxEntries option to MongoClient.connect() .

const db = await mongodb.MongoClient.connect( 'mongodb://localhost:27000/test' , { bufferMaxEntries: 0 }); db.on( 'close' , () => { console .log( '-> lost connection' ); }); db.on( 'reconnect' , () => { console .log( '-> reconnected' ); }); await server.stop(); console .log( 'stopped server' ); await new Promise (resolve => setTimeout(() => resolve(), 2000 )); db.collection( 'Test' ).findOne({}, function ( error ) { console .log( 'Finished query' , error); }); await new Promise (resolve => setTimeout(() => resolve(), 2000 )); await server.start(); console .log( 'restarted server' );

Buffering will send operations to the server when the driver reconnects, or return an error if the driver gave up trying to reconnect. By default, the MongoDB driver stops trying to reconnect to a single server after 30 seconds. Specifically, the driver will retry reconnectTries times every reconnectInterval milliseconds (both options to the MongoClient.connect() function ) and give up when it runs out of reconnectTries . To see this in action, let's reduce reconnectTries and reconnectInterval so the driver gives up trying to reconnect after 1 second. Any buffered queries will give back a 'failed to reconnect' error, and the driver will emit a 'reconnectFailed' event.

const db = await mongodb.MongoClient.connect( 'mongodb://localhost:27000/test' , { reconnectTries: 2 , reconnectInterval: 500 }); db.on( 'close' , () => { console .log( '-> lost connection' ); }); db.s.topology.on( 'reconnectFailed' , () => { console .log( '-> gave up reconnecting' ); }); db.on( 'reconnect' , () => { console .log( '-> reconnected' ); }); await server.stop(); console .log( 'stopped server' ); await new Promise (resolve => setTimeout(() => resolve(), 100 )); db.collection( 'Test' ).findOne({}, function ( error ) { console .log( 'Finished query' , error); }); await new Promise (resolve => setTimeout(() => resolve(), 2000 )); await server.start(); console .log( 'restarted server' );

When you get the 'reconnectFailed' event, you should crash your application or manually reconnect, because your db handle will never finish reconnecting. You should tune reconnectTries to something that makes sense for you. Some prefer to set it to Number.MAX_VALUE so the driver will continue trying to reconnect forever. Others prefer to make it relatively small and start crashing their application so their alerting can kick in. If you choose to make reconnectTries large enough that the driver will continue to try to reconnect forever, I would recommend you shut off buffering by setting bufferMaxEntries: 0 , because otherwise you'll have database operations that can be queued up for arbitrarily long periods of time.

Replica Set Outages

The driver handles replica set connections very differently from how it handles single server connections. Even connecting to a one node replica set is different from connecting to a standalone server. One caveat is that when one server disconnects, you might still be able to read and write depending on your read preference, so when dealing with replica sets the important events are 'left' and 'joined', reflecting when a server joins or leaves the replica set.

The general idea of these events is that when you get a 'left' event with data === 'primary' that means the replica set has no primary, so writes will get buffered, or fail if you disabled buffering. Then, when you get a 'joined' event with data === 'primary' the replica set now has a primary and you can execute writes again. Here's an example of the events that get emitted when the primary gets shut down.

const { ReplSet } = require ( 'mongodb-topology-manager' ); const { inspect } = require ( 'util' ); const mongodb = require ( 'mongodb' ); run().catch(error => console .error(error)); async function run ( ) { const bind_ip = 'localhost' ; const replSet = new ReplSet( 'mongod' , [ { options: { port: 31000 , dbpath: ` ${__dirname} /data/db/31000` , bind_ip } }, { options: { port: 31001 , dbpath: ` ${__dirname} /data/db/31001` , bind_ip } }, { options: { port: 31002 , dbpath: ` ${__dirname} /data/db/31002` , bind_ip } } ], { replSet: 'rs0' }); await replSet.purge(); await replSet.start(); console .log( 'Replica set started...' ); const db = await mongodb.MongoClient.connect( 'mongodb://localhost:31000,localhost:31001,localhost:31002/test?replicaSet=rs0' ); db.s.topology.on( 'left' , data => console .log( '-> left' , data)); db.s.topology.on( 'joined' , data => console .log( '-> joined' , data)); db.on( 'fullsetup' , () => console .log( '-> all servers connected' )); const primary = await replSet.primary(); db.s.topology.once( 'left' , () => { db.collection( 'Test' ).findOne({}, function ( error ) { console .log( 'Finished findOne' , error); }); }); console .log( 'removing primary...' ); await replSet.removeMember(primary); console .log( 'done' ); }

Replica set connections are still subject to buffering by default, so the findOne() query above will hang until a the replica set elects a new primary (modulo read preferences ). You can set bufferMaxEntries: 0 on a replica set connection as well to fail fast when there's no connection available.

mongodb.MongoClient.connect( 'mongodb://localhost:31000,localhost:31001,localhost:31002/test?replicaSet=rs0' , { bufferMaxEntries: 0 });

Being cognizant of buffering is especially important for replica set connections because by default the MongoDB driver will never stop trying to reconnect to a replica set. There is no equivalent to the single server 'reconnectFailed' event for replica sets because the driver will continue trying to reconnect indefinitely. Even if you shut down the entire replica set and restart it later, the driver will keep trying to reconnect and keep buffering your requests.

const { ReplSet } = require ( 'mongodb-topology-manager' ); const { inspect } = require ( 'util' ); const mongodb = require ( 'mongodb' ); run().catch(error => console .error(error)); async function run ( ) { const bind_ip = 'localhost' ; const replSet = new ReplSet( 'mongod' , [ { options: { port: 31000 , dbpath: ` ${__dirname} /data/db/31000` , bind_ip } }, { options: { port: 31001 , dbpath: ` ${__dirname} /data/db/31001` , bind_ip } }, { options: { port: 31002 , dbpath: ` ${__dirname} /data/db/31002` , bind_ip } } ], { replSet: 'rs0' }); await replSet.purge(); await replSet.start(); console .log( 'Replica set started...' ); const db = await mongodb.MongoClient.connect( 'mongodb://localhost:31000,localhost:31001,localhost:31002/test?replicaSet=rs0' ); db.s.topology.on( 'left' , data => console .log( '-> left' , data)); db.s.topology.on( 'joined' , data => console .log( '-> joined' , data)); db.on( 'fullsetup' , () => console .log( '-> all servers connected' )); await replSet.stop(); db.collection( 'Test' ).findOne({}, function ( error ) { console .log( 'Finished findOne' , error); }); await new Promise (resolve => setTimeout(() => resolve(), 45000 )); console .log( 'restarting' ); await replSet.restart(); console .log( 'done' ); }

A final note: the MongoDB driver has an autoReconnect option that is on by default. You can turn it off to make it so that the MongoDB driver never tries to reconnect. However, turning this option off is almost always wrong and should only be done if you're an advanced user with a very good reason to do so. If you're looking to make your MongoDB queries fail fast when connection is lost, bufferMaxEntries: 0 is the right way to do it, not autoReconnect: false .

Moving On