- Sharding >
- Sharded Cluster Tutorials >
- Sharded Cluster Deployment Tutorials >
- Convert a Replica Set to a Replicated Sharded Cluster
Convert a Replica Set to a Replicated Sharded Cluster¶
Overview¶
Following this tutorial, you will convert a single 3-member replica set to a cluster that consists of 2 shards. Each shard will consist of an independent 3-member replica set.
The tutorial uses a test environment running on a local system UNIX-like system. You should feel encouraged to “follow along at home.” If you need to perform this process in a production environment, notes throughout the document indicate procedural differences.
The procedure, from a high level, is as follows:
- Create or select a 3-member replica set and insert some data into a collection.
- Start the config databases and create a cluster with a single shard.
- Create a second replica set with three new
mongod
instances. - Add the second replica set as a shard in the cluster.
- Enable sharding on the desired collection or collections.
Process¶
Install MongoDB according to the instructions in the MongoDB Installation Tutorial.
Deploy a Replica Set with Test Data¶
If have an existing MongoDB replica set deployment, you can omit the this step and continue from Deploy Sharding Infrastructure.
Use the following sequence of steps to configure and deploy a replica set and to insert test data.
Create the following directories for the first replica set instance, named
firstset
:/data/example/firstset1
/data/example/firstset2
/data/example/firstset3
To create directories, issue the following command:
In a separate terminal window or GNU Screen window, start three
mongod
instances by running each of the following commands:Note
The
--oplogSize 700
option restricts the size of the operation log (i.e. oplog) for eachmongod
instance to 700MB. Without the--oplogSize
option, eachmongod
reserves approximately 5% of the free disk space on the volume. By limiting the size of the oplog, each instance starts more quickly. Omit this setting in production environments.In a
mongo
shell session in a new terminal, connect to the mongodb instance on port 10001 by running the following command. If you are in a production environment, first read the note below.Note
Above and hereafter, if you are running in a production environment or are testing this process with
mongod
instances on multiple systems, replace “localhost” with a resolvable domain, hostname, or the IP address of your system.In the
mongo
shell, initialize the first replica set by issuing the following command:In the
mongo
shell, create and populate a new collection by issuing the following sequence of JavaScript operations:The above operations add one million documents to the collection
test_collection
. This can take several minutes, depending on your system.The script adds the documents in the following form:
Deploy Sharding Infrastructure¶
This procedure creates the three config databases that store the cluster’s metadata.
Note
For development and testing environments, a single config database is sufficient. In production environments, use three config databases. Because config instances store only the metadata for the sharded cluster, they have minimal resource requirements.
Create the following data directories for three config database instances:
/data/example/config1
/data/example/config2
/data/example/config3
Issue the following command at the system prompt:
In a separate terminal window or GNU Screen window, start the config databases by running the following commands:
In a separate terminal window or GNU Screen window, start
mongos
instance by running the following command:Note
If you are using the collection created earlier or are just experimenting with sharding, you can use a small
--chunkSize
(1MB works well.) The defaultchunkSize
of 64MB means that your cluster must have 64MB of data before the MongoDB’s automatic sharding begins working.In production environments, do not use a small shard size.
The
configdb
options specify the configuration databases (e.g.localhost:20001
,localhost:20002
, andlocalhost:2003
). Themongos
instance runs on the default “MongoDB” port (i.e.27017
), while the databases themselves are running on ports in the30001
series. In the this example, you may omit the--port 27017
option, as27017
is the default port.Add the first shard in
mongos
. In a new terminal window or GNU Screen session, add the first shard, according to the following procedure:
Deploy a Second Replica Set¶
This procedure deploys a second replica set. This closely mirrors the process used to establish the first replica set above, omitting the test data.
Create the following data directories for the members of the second replica set, named
secondset
:/data/example/secondset1
/data/example/secondset2
/data/example/secondset3
In three new terminal windows, start three instances of
mongod
with the following commands:Note
As above, the second replica set uses the smaller
oplogSize
configuration. Omit this setting in production environments.In the
mongo
shell, connect to one mongodb instance by issuing the following command:In the
mongo
shell, initialize the second replica set by issuing the following command:Add the second replica set to the cluster. Connect to the
mongos
instance created in the previous procedure and issue the following sequence of commands:This command returns the following success message:
Verify that both shards are properly configured by running the
listShards
command. View this and example output below:
Enable Sharding¶
MongoDB must have sharding enabled on both the database and collection levels.
Enabling Sharding on the Database Level¶
Issue the enableSharding
command. The following example
enables sharding on the “test” database:
Create an Index on the Shard Key¶
MongoDB uses the shard key to distribute documents between shards. Once selected, you cannot change the shard key. Good shard keys:
- have values that are evenly distributed among all documents,
- group documents that are often accessed at the same time into contiguous chunks, and
- allow for effective distribution of activity among shards.
Typically shard keys are compound, comprising of some sort of hash and some sort of other primary key. Selecting a shard key depends on your data set, application architecture, and usage pattern, and is beyond the scope of this document. For the purposes of this example, we will shard the “number” key. This typically would not be a good shard key for production deployments.
Create the index with the following procedure:
See also
The Shard Key Overview and Shard Key sections.
Shard the Collection¶
Issue the following command:
The collection test_collection
is now sharded!
Over the next few minutes the Balancer begins to redistribute
chunks of documents. You can confirm this activity by switching to the
test
database and running db.stats()
or
db.printShardingStatus()
.
As clients insert additional documents into this collection,
mongos
distributes the documents evenly between the shards.
In the mongo
shell, issue the following commands to return
statics against each cluster:
Example output of the db.stats()
command:
Example output of the db.printShardingStatus()
command:
In a few moments you can run these commands for a second time to
demonstrate that chunks are migrating from
firstset
to secondset
.
When this procedure is complete, you will have converted a replica set into a cluster where each shard is itself a replica set.