Balancing is the process MongoDB uses to distribute data of a sharded
collection evenly across a sharded cluster. When a
shard has too many of a sharded collection’s chunks compared to other shards, MongoDB automatically balances
the chunks across the shards. The balancing procedure for
sharded clusters is entirely transparent to
the user and application layer.
The balancer process is responsible for redistributing the
chunks of a sharded collection evenly among the shards for every
sharded collection. By default, the balancer process is always enabled.
Any mongos instance in the cluster can start a balancing
round. When a balancer process is active, the responsible
mongos acquires a “lock” by modifying a document in the
lock collection in the Config Database.
Changed in version 2.0: Before MongoDB version 2.0, large differences in timekeeping
(i.e. clock skew) between mongos instances could lead
to failed distributed locks. This carries the possibility of
data loss, particularly with skews larger than 5 minutes.
Always use the network time protocol (NTP) by running ntpd
on your servers to minimize clock skew.
To address uneven chunk distribution for a sharded collection, the
balancer migrates chunks from
shards with more chunks to shards with a fewer number of chunks. The
balancer migrates the chunks, one at a time, until there is an even
distribution of chunks for the collection across the shards. For details
about chunk migration, see Chunk Migration Procedure.
Changed in version 2.6: Chunk migrations can have an impact on disk space. Starting in
MongoDB 2.6, the source shard automatically archives the migrated
documents by default. For details, see moveChunk directory.
Chunk migrations carry some overhead in terms of bandwidth and
workload, both of which can impact database performance. The
balancer attempts to minimize the impact by:
Starting a balancing round only when the difference in the
number of chunks between the shard with the greatest number of chunks
for a sharded collection and the shard with the lowest number of
chunks for that collection reaches the migration threshold.
To minimize the impact of balancing on the cluster, the
balancer will not begin balancing until the distribution of
chunks for a sharded collection has reached certain thresholds. The
thresholds apply to the difference in number of chunks
between the shard with the most chunks for the collection and the shard
with the fewest chunks for that collection. The balancer has the
Changed in version 2.2: The following thresholds appear first in 2.2. Prior to this
release, a balancing round would only start if the shard with the most
chunks had 8 more chunks than the shard with the least number of
Number of Chunks
Fewer than 20
80 and greater
Once a balancing round starts, the balancer will not stop until, for
the collection, the difference between the number of chunks on any two
shards for that collection is less than two or a chunk migration
By default, MongoDB will attempt to fill all available disk space with
data on every shard as the data set grows. To ensure that the cluster
always has the capacity to handle data growth, monitor disk
usage as well as other performance metrics.
When adding a shard, you may set a “maximum size” for that shard.
This prevents the balancer from migrating chunks to the shard
when the value of mapped exceeds the
“maximum size”. Use the maxSize parameter of the
addShard command to set the “maximum size” for the shard.