Note if you are in a hurry please read the synopses below and refer to the source code on GitHub – https://github.com/yazad3/mongodb-sharded-replSet-GridFS/ (start with startMongoDBShardedReplSet.sh)

Synopses

This blog will describe the steps in setting up a MongoDB sharded replica set for GridFS. The steps to shard or create a replica set for GridFS is no different from a regular MongoDB Database or Collection. For this blog I have used a Ubuntu Desktop Virtual Machine, though one could setup this environment on other OSs like Windows, UNIX, etc.

I will start with setting up an environment with 2 shards (shard1 and shard2),

Illustration 1: Two Shard Replica Set

Eventually I will add one additional shard to the setup,

Illustration 2: Add an additional shard

I wanted to demonstrate that multiple hosts could be used here to create in a sort a distributed File System using GridFS, hence every sharded replica set node is on a different host (actually on the same host – used host mapping entries to create multiple hosts). Add the following entries to the /etc/hosts file. Note all the IPs are pointing to local host since I didn’t want to setup those many VMs yet wanted to demonstrate a multi-host setup.

#

127.1.1.1 shard1_repl1.mongo-server.com

127.1.1.2 shard1_repl2.mongo-server.com

127.1.1.3 shard1_repl3.mongo-server.com

127.1.1.4 shard2_repl1.mongo-server.com

127.1.1.5 shard2_repl2.mongo-server.com

127.1.1.6 shard2_repl3.mongo-server.com

127.1.1.7 config-1.mongo-server.com

127.1.1.8 config-2.mongo-server.com

127.1.1.9 config-3.mongo-server.com

127.1.1.10 mongos.mongo-server.com # 127.1.1.11 shard3_repl1.mongo-server.com 127.1.1.12 shard3_repl2.mongo-server.com 127.1.1.13 shard3_repl3.mongo-server.com

Table 1: /etc/hosts snippet

I had also added extra disks to my virtual machine so I could allocate one disk to each mapped host above (using one disk for all logs for simplicity; /media/app is where mongodb is installed). See output of “df -h” command below,

/dev/sdc 7.8G 247M 7.2G 4% /media/app /dev/sdf 2.9G 176M 2.6G 7% /media/config-1 /dev/sdb 2.0G 175M 1.7G 10% /media/config-2 /dev/sdd 2.0G 175M 1.7G 10% /media/config-3 /dev/sdh1 4.8G 363M 4.2G 8% /media/shard1_repl1 /dev/sdg 4.8G 363M 4.2G 8% /media/shard1_repl2 /dev/sdj 4.8G 363M 4.2G 8% /media/shard1_repl3 /dev/sdk 4.8G 363M 4.2G 8% /media/shard2_repl1 /dev/sdl 4.8G 363M 4.2G 8% /media/shard2_repl2 /dev/sde 4.8G 363M 4.2G 8% /media/shard2_repl3 /dev/sdn 2.4G 356M 2.0G 16% /media/shard3_repl1 /dev/sdo 2.4G 356M 2.0G 16% /media/shard3_repl2 /dev/sdp 2.4G 356M 2.0G 16% /media/shard3_repl3 /dev/sdi 7.8G 23M 7.4G 1% /media/log

Table 2: df -h output snippet

Setup

Setting up sharded replica set

Now let’s get started with initial 2 shard setup see Illustration 1: Two Shard Replica Set.

#!/bin/bash

shard=$1

replSet=$2

port=$3

basePath=/media

dbPathDir=$basePath/$replSet/data/

logPathDir=$basePath/log/$shard/$replSet

#Create directories for DB and Log

mkdir -p $dbPathDir $logPathDir

logPath=$logPathDir/$replSet.log

bind_ip=$replSet.mongo-server.com

mongod –replSet $shard –logpath $logPath –dbpath $dbPathDir –bind_ip $bind_ip –port $port –shardsvr –fork

Table 3: _createMongoDBShardedReplSet.sh

Invoking the script create one replica node in a shard, example,

./_createMongoDBShardedReplSet.sh shard1 shard1_repl1 3001



Table 4: Invoke – _createMongoDBShardedReplSet.sh

Repeat this step for all the nodes in the sharded replica set.

Setting up Config Server

Note that the current version of MongoDB sharding requires exactly 3 config instances setup. The script to create a config server is,

#!/bin/bash

configSvrName=$1

port=$2

basePath=/media

dbPathDir=$basePath/$configSvrName/data/

logPathDir=$basePath/log/$configSvrName

mkdir -p $dbPathDir $logPathDir

logPath=$logPathDir/$configSvrName.log

bind_ip=${configSvrName}.mongo-server.com

mongod –logpath ${logPath} –dbpath ${dbPathDir} –bind_ip ${bind_ip} –port ${port} –configsvr –fork

Table 5: _createMongoDBConfigSvr.sh

This script can be invoked as show in the below example,

./_createMongoDBConfigSvr.sh config-1 8001



Table 6: Invoke _createMongoDBConfigSvr.sh

Setup mongos process

#!/bin/bash

configServerPort1=$1

configServerPort2=$2

configServerPort3=$3

configHost1=config-1.mongo-server.com

configHost2=config-2.mongo-server.com

configHost3=config-3.mongo-server.com

configdb=${configHost1}:${configServerPort1},${configHost2}:${configServerPort2},${configHost3}:${configServerPort3}

basePath=/media

logPathDir=$basePath/log/mongos

mkdir -p $logPathDir

logPath=$logPathDir/mongos.log

mongos –logpath $logPath –configdb $configdb –fork

Table 7: _configMongoDBShard.sh

Invoke _configMongoDBShard.sh to start the mongos process.

./_configMongoDBShard.sh 8001 8002 8003



Table 8: Invoke _configMongoDBShard.sh

(Note the above script can be improved by taking the host names as parameters).

Putting it altogether

View the start-up script – startMongoDBShardedReplSet.sh (https://github.com/yazad3/mongodb-sharded-replSet-GridFS/blob/master/startMongoDBShardedReplSet.sh)

For the first time run (after hosts and mounts are setup),

./startMongoDBShardedReplSet.sh TRUE

Table 9: Invoking startMongoDBShardedReplSet.sh for the first time

Note the TRUE parameter is significant for initial setup – this TRUE parameter must be skipped for subsequent start-ups,

./startMongoDBShardedReplSet.sh

Table 10: Invoking startMongoDBShardedReplSet.sh subsequently

Adding an additional Shard

To add an additional shard, you can use the same script used in the “Setting up sharded replica set” section. The shards could be running while you do this. See https://github.com/yazad3/mongodb-sharded-replSet-GridFS/blob/master/add3rdShard.sh for an example of adding a shard to an existing environment of shared replica sets.

Stop and cleanup

The stop script is available here – it stops all the nodes in the correct sequence (including the replica set nodes on the 3rd shard) – https://github.com/yazad3/mongodb-sharded-replSet-GridFS/blob/master/stopMongoDBShardedReplSet.sh

Finally if you have messed something up – you can run cleanup (WARNING: Cleanup will delete all your data!) – https://github.com/yazad3/mongodb-sharded-replSet-GridFS/blob/master/cleanup.sh