Maintaining a Balanced Data Distribution¶
The addition of new data or the addition of new servers can result in data distribution imbalances within the cluster, such as a particular shard contains significantly more chunks than another shard or a size of a chunk is significantly greater than other chunk sizes.
MongoDB ensures a balanced cluster using two background process: splitting and the balancer.
Splitting is a background process that keeps chunks from growing too large. When a chunk grows beyond a specified chunk size, MongoDB splits the chunk in half. Inserts and updates triggers splits. Splits are a efficient meta-data change. To create splits, MongoDB does not migrate any data or affect the shards.
Diagram of a shard with a chunk that exceeds the default chunk size of 64 MB and triggers a split of the chunk into two chunks.
The balancer is a background process that manages chunk migrations. The balancer runs in all of the query routers in a cluster.
When the distribution of a sharded collection in a cluster is uneven, the balancer process migrates chunks from the shard that has the largest number of chunks to the shard with the least number of chunks until the collection balances. For example: if collection users has 100 chunks on shard 1 and 50 chunks on shard 2, the balancer will migrate chunks from shard 1 to shard 2 until the collections achieves balance.
The shards manage chunk migrations as a background operation. During migration, all requests for a chunks data address the “origin” shard.
In a chunk migration, the destination shard receives all the documents in the chunk from the origin shard. Then, the destination shard captures and applies all changes made to the data during migration process. Finally, the destination shard updates the metadata regarding the location of the on config server.
If there’s an error during the migration, the balancer aborts the process leaving the chunk on the origin shard. MongoDB removes the chunks data from the origin shard after the migration completes successfully.
Diagram of a collection distributed across three shards. For this collection, the difference in the number of chunks between the shards reaches the migration thresholds (in this case, 2) and triggers migration.
Adding and Removing Shards from the Cluster¶
Adding a shard to a cluster creates an imbalance since the new shard has no chunks. While MongoDB begins migrating data to the new shard immediately, it can take some time before the cluster balances.
When removing a shard, the balancer migrates all chunks from to other shards. After migrating all data and updating the meta data, you can safely remove the shard.