For many collections there may be no single, naturally occurring key that possesses all the qualities of a good shard key. The following strategies may help construct a useful shard key from existing data:
Compute a more ideal shard key in your application layer, and store this in all of your documents, potentially in the _id field.
Use a compound shard key that uses two or three values from all documents that provide the right mix of cardinality with scalable write operations and query isolation.
Determine that the impact of using a less than ideal shard key is insignificant in your use case, given:
New in version 2.4: Use a hashed shard key. Choose a field that has high cardinality and create a hashed index on that field. MongoDB uses these hashed index values as shard key values, which ensures an even distribution of documents across the shards.
MongoDB automatically computes the hashes when resolving queries using hashed indexes. Applications do not need to compute hashes.
Choosing the correct shard key can have a great impact on the performance, capability, and functioning of your database and cluster. Appropriate shard key choice depends on the schema of your data and the way that your applications query and write data.
An easily divisible shard key makes it easy for MongoDB to distribute content among the shards. Shard keys that have a limited number of possible values can result in chunks that are “unsplittable.”
A shard key with high degree of randomness prevents any single shard from becoming a bottleneck and will distribute write operations among the cluster.
A shard key that targets a single shard makes it possible for the mongos program to return most query operations directly from a single specific mongod instance. Your shard key should be the primary field used by your queries. Fields with a high degree of “randomness” make it difficult to target operations to specific shards.
Cardinality in the context of MongoDB, refers to the ability of the system to partition data into chunks. For example, consider a collection of data such as an “address book” that stores address records:
Consider the use of a state field as a shard key:
The state key’s value holds the US state for a given address document. This field has a low cardinality as all documents that have the same value in the state field must reside on the same shard, even if a particular state’s chunk exceeds the maximum chunk size.
Since there are a limited number of possible values for the state field, MongoDB may distribute data unevenly among a small number of fixed chunks. This may have a number of effects:
Consider the use of a zipcode field as a shard key:
While this field has a large number of possible values, and thus has potentially higher cardinality, it’s possible that a large number of users could have the same value for the shard key, which would make this chunk of users un-splittable.
In these cases, cardinality depends on the data. If your address book stores records for a geographically distributed contact list (e.g. “Dry cleaning businesses in America,”) then a value like zipcode would be sufficient. However, if your address book is more geographically concentrated (e.g “ice cream stores in Boston Massachusetts,”) then you may have a much lower cardinality.
Consider the use of a phone-number field as a shard key:
Phone number has a high cardinality, because users will generally have a unique value for this field, MongoDB will be able to split as many chunks as needed.