The shard key determines the distribution of the collection’s
documents among the cluster’s shards. The shard key is either an indexed field or an
indexed compound field that exists in every document in the
MongoDB partitions data in the collection using ranges of shard key
values. Each range, or chunk, defines a non-overlapping range
of shard key values. MongoDB distributes the chunks, and their
documents, among the shards in the cluster.
When a chunk grows beyond the chunk size,
MongoDB attempts to split the chunk into smaller
chunks, always based on ranges in the shard key.
Hashed shard keys use a hashed index of a
single field as the shard key to partition data across your
The field you choose as your hashed shard key should have a good
cardinality, or large number of different values. Hashed keys work
well with fields that increase monotonically like ObjectId
values or timestamps.
If you shard an empty collection using a hashed shard key, MongoDB
will automatically create and migrate chunks so that each shard has
two chunks. You can control how many chunks MongoDB will create with
the numInitialChunks parameter to shardCollection or
by manually creating chunks on the empty collection using the
The shard key affects write and query performance by determining how
the MongoDB partitions data in the cluster and how effectively the
mongos instances can direct operations to the
cluster. Consider the following operational impacts of shard key
Some possible shard keys will allow your application to take advantage of
the increased write capacity that the cluster can provide, while
others do not. Consider the following example where you shard by the
values of the default _id field, which is ObjectId.
MongoDB generates ObjectId values upon document creation to
produce a unique identifier for the object. However, the most
significant bits of data in this value represent a time stamp, which
means that they increment in a regular and predictable pattern. Even
though this value has high cardinality, when using this, any date, or
other monotonically increasing number as the shard key, all insert
operations will be storing data into a single chunk, and therefore, a
single shard. As a result, the write capacity of this shard will
define the effective write capacity of the cluster.
A shard key that increases monotonically will not hinder performance
if you have a very low insert rate, or if most of your write
operations are update() operations
distributed through your entire data set. Generally, choose shard keys
that have both high cardinality and will distribute write operations
across the entire cluster.
Typically, a computed shard key that has some amount of “randomness,”
such as ones that include a cryptographic hash (i.e. MD5 or SHA1) of
other content in the document, will allow the cluster to scale write
operations. However, random shard keys do not typically provide
query isolation, which is
another important characteristic of shard keys.
The mongos provides an interface for applications to
interact with sharded clusters that hides the complexity of data
partitioning. A mongos receives queries from
applications, and uses metadata from the config server, to route queries to the mongod
instances with the appropriate data. While the mongos
succeeds in making all querying operational in sharded environments,
the shard key you select can have a profound affect on query
Generally, the fastest queries in a sharded environment are those that
mongos will route to a single shard, using the
shard key and the cluster meta data from the config server. For queries that don’t include the shard
key, mongos must query all shards, wait for their responses
and then return the result to the application. These “scatter/gather”
queries can be long running operations.
If your query includes the first component of a compound shard
key , the mongos can route the
query directly to a single shard, or a small number of shards, which
provides better performance. Even if you query values of the shard
key that reside in different chunks, the mongos will route
queries directly to specific shards.
To select a shard key for a collection:
determine the most commonly included fields in queries for a
find which of these operations are most performance dependent.
If this field has low cardinality (i.e not sufficiently
selective) you should add a second field to the shard key making a
compound shard key. The data may become more splittable with a
compound shard key.
In many ways, you can think of the shard key a
cluster-wide index. However, be aware that sharded systems
cannot enforce cluster-wide unique indexes unless the unique
field is in the shard key. Consider the Index Concepts page
for more information on indexes and compound indexes.