MongoDB Sharding Quick Tutorial


A quick and simple tutorial about MongoDB sharding, for testing purposes. We'll use this configuration:

  • 2 shards
  • 4 machines in total with the configuration server and a frontend
  • MongoDB 3.4

Create all the machines

You can use real or virtual machines. I used virtual machines from an OpenStack cloud.

On all machines, in /etc/hosts make sure that

  • has the machine hostname
  • the other machines are there (except for the frontend, not necessary)

Example: /etc/hosts on mongodb-test-shard1: localhost mongodb-test-shard1 mongodb-test-shard2 mongodb-test-config

Example: /etc/hosts on the frontend: localhost test-jerome mongodb-test-shard1 mongodb-test-shard2 mongodb-test-config

Launch mongod on each shard machine

Create the data folder (it can be anywhere):

sudo mkdir -p /data/db
sudo chmod -R 777 /data/db

Kill any existing mongod process:

killall -q mongod

Launch mongod for the shard:

mongod --shardsvr --replSet rs1 --dbpath /data/db --fork --logpath=/data/db/log.txt

# note: on mongodb-test-shard2, replace rs1 by rs2.
# the replica set name has to be unique to each shard.

Initiate the replica set:

mongo localhost:27018 --eval "JSON.stringify(rs.initiate())"

Launch mongod on the configuration machine

Create the config folder (it can also be anywhere):

sudo mkdir -p /data/configdb 
sudo chmod -R 777 /data/configdb 

Kill any existing mongod process:

killall -q mongod

Launch mongod for config server:

mongod --configsvr --replSet c1 --enableMajorityReadConcern --fork --logpath=/data/configdb/log.txt

Initiate the replica set (note the default port is different):

mongo localhost:27019 --eval "JSON.stringify(rs.initiate())"

Launch mongos on the frontend machine

mongos --configdb c1/mongodb-test-config

Keep this terminal open.

Configure the cluster on the frontend machine

In another terminal, on the frontend machine, connect to the mongos process:

mongo localhost:27017

Add the shards


Enable sharding for your database


Enable sharding for your collection (use a shard key appropriate for you).

sh.shardCollection("tests.vquest_metadata", {_id:1})

# warning: this needs to be done again after a collection drop.

Add data to your collection normally. It will be automatically chunked. The chunks will be distributed over the two shards.

If using a CSV file, you can use:

mongoimport -d tests -c vquest_metadata --type csv --file <filename> --headerline --port 27017

Get a summary of the cluster status:


	{  "_id" : "rs1",  "host" : "rs1/mongodb-test-shard1:27018",  "state" : 1 }
	{  "_id" : "rs2",  "host" : "rs2/mongodb-test-shard2:27018",  "state" : 1 }

  active mongoses:
	"3.4.0" : 1
	Currently enabled: yes
	Currently enabled:  yes
	Currently running:  no
		Balancer lock taken at Mon Jan 30 2017 19:57:30 GMT+0000 (UTC) by ConfigServer:Balancer
	Failed balancer rounds in last 5 attempts:  0
	Migration Results for the last 24 hours: 
		27 : Success
		1 : Failed with error 'aborted', from rs1 to rs2
	{  "_id" : "tests",  "primary" : "rs1",  "partitioned" : true }
			shard key: { "_id" : 1 }
			unique: false
			balancing: true
				rs1	9
				rs2	9
			too many chunks to print, use verbose if you want to force print

Query as usual: