MongoDB Sharding Quick Tutorial

What

A quick and simple tutorial about MongoDB sharding, for testing purposes. We'll use this configuration:

  • 2 shards
  • 4 machines in total with the configuration server and a frontend
  • MongoDB 3.4

Create all the machines

You can use real or virtual machines. I used virtual machines from an OpenStack cloud.

On all machines, in /etc/hosts make sure that

  • 127.0.0.1 has the machine hostname
  • the other machines are there (except for the frontend, not necessary)

Example: /etc/hosts on mongodb-test-shard1:

127.0.0.1 localhost mongodb-test-shard1

192.168.108.16 mongodb-test-shard2
192.168.108.21 mongodb-test-config

Example: /etc/hosts on the frontend:

127.0.0.1 localhost test-jerome

192.168.108.22 mongodb-test-shard1
192.168.108.16 mongodb-test-shard2

192.168.108.21 mongodb-test-config

Launch mongod on each shard machine

Create the data folder (it can be anywhere):

sudo mkdir -p /data/db
sudo chmod -R 777 /data/db

Kill any existing mongod process:

killall -q mongod

Launch mongod for the shard:

mongod --shardsvr --replSet rs1 --dbpath /data/db --fork --logpath=/data/db/log.txt

# note: on mongodb-test-shard2, replace rs1 by rs2.
# the replica set name has to be unique to each shard.

Initiate the replica set:

mongo localhost:27018 --eval "JSON.stringify(rs.initiate())"

Launch mongod on the configuration machine

Create the config folder (it can also be anywhere):

sudo mkdir -p /data/configdb 
sudo chmod -R 777 /data/configdb 

Kill any existing mongod process:

killall -q mongod

Launch mongod for config server:

mongod --configsvr --replSet c1 --enableMajorityReadConcern --fork --logpath=/data/configdb/log.txt

Initiate the replica set (note the default port is different):

mongo localhost:27019 --eval "JSON.stringify(rs.initiate())"

Launch mongos on the frontend machine

mongos --configdb c1/mongodb-test-config

Keep this terminal open.

Configure the cluster on the frontend machine

In another terminal, on the frontend machine, connect to the mongos process:

mongo localhost:27017

Add the shards

sh.addShard("rs1/mongodb-test-shard1:27018")
sh.addShard("rs2/mongodb-test-shard2:27018")

Enable sharding for your database

sh.enableSharding("tests")

Enable sharding for your collection (use a shard key appropriate for you).

sh.shardCollection("tests.vquest_metadata", {_id:1})

# warning: this needs to be done again after a collection drop.

Add data to your collection normally. It will be automatically chunked. The chunks will be distributed over the two shards.

If using a CSV file, you can use:

mongoimport -d tests -c vquest_metadata --type csv --file <filename> --headerline --port 27017

Get a summary of the cluster status:

sh.status()

...
  shards:
	{  "_id" : "rs1",  "host" : "rs1/mongodb-test-shard1:27018",  "state" : 1 }
	{  "_id" : "rs2",  "host" : "rs2/mongodb-test-shard2:27018",  "state" : 1 }

  active mongoses:
	"3.4.0" : 1
 autosplit:
	Currently enabled: yes
  balancer:
	Currently enabled:  yes
	Currently running:  no
		Balancer lock taken at Mon Jan 30 2017 19:57:30 GMT+0000 (UTC) by ConfigServer:Balancer
	Failed balancer rounds in last 5 attempts:  0
	Migration Results for the last 24 hours: 
		27 : Success
		1 : Failed with error 'aborted', from rs1 to rs2
  databases:
	{  "_id" : "tests",  "primary" : "rs1",  "partitioned" : true }
		tests.vquest_metadata
			shard key: { "_id" : 1 }
			unique: false
			balancing: true
			chunks:
				rs1	9
				rs2	9
			too many chunks to print, use verbose if you want to force print

Query as usual:

db.vquest_metadata.find().count()
625540

Feedback