OPTIONS

Considerations for Selecting Shard Keys

Choosing a Shard Key

For many collections there may be no single, naturally occurring key that possesses all the qualities of a good shard key. The following strategies may help construct a useful shard key from existing data:

  1. Compute a more ideal shard key in your application layer, and store this in all of your documents, potentially in the _id field.

  2. Use a compound shard key that uses two or three values from all documents that provide the right mix of cardinality with scalable write operations and query isolation.

  3. Determine that the impact of using a less than ideal shard key is insignificant in your use case, given:

    • limited write volume,
    • expected data size, or
    • application query patterns.
  4. New in version 2.4: Use a hashed shard key. Choose a field that has high cardinality and create a hashed index on that field. MongoDB uses these hashed index values as shard key values, which ensures an even distribution of documents across the shards.

    Tip

    MongoDB automatically computes the hashes when resolving queries using hashed indexes. Applications do not need to compute hashes.

Considerations for Selecting Shard Key

Choosing the correct shard key can have a great impact on the performance, capability, and functioning of your database and cluster. Appropriate shard key choice depends on the schema of your data and the way that your applications query and write data.

Create a Shard Key that is Easily Divisible

An easily divisible shard key makes it easy for MongoDB to distribute content among the shards. Shard keys that have a limited number of possible values can result in chunks that are “unsplittable”.

For instance, if a chunk represents a single shard key value, then MongoDB cannot split the chunk even when the chunk exceeds the size at which splits occur.

See also

Cardinality

Create a Shard Key that has High Degree of Randomness

A shard key with high degree of randomness prevents any single shard from becoming a bottleneck and will distribute write operations among the cluster.

See also

Write Scaling

Create a Shard Key that Targets a Single Shard

A shard key that targets a single shard makes it possible for the mongos program to return most query operations directly from a single specific mongod instance. Your shard key should be the primary field used by your queries. Fields with a high degree of “randomness” make it difficult to target operations to specific shards.

See also

Query Isolation

Shard Using a Compound Shard Key

The challenge when selecting a shard key is that there is not always an obvious choice. Often, an existing field in your collection may not be the optimal key. In those situations, computing a special purpose shard key into an additional field or using a compound shard key may help produce one that is more ideal.

Cardinality

Cardinality in the context of MongoDB, refers to the ability of the system to partition data into chunks. For example, consider a collection of data such as an “address book” that stores address records:

  • Consider the use of a state field as a shard key:

    The state key’s value holds the US state for a given address document. This field has a low cardinality as all documents that have the same value in the state field must reside on the same shard, even if a particular state’s chunk exceeds the maximum chunk size.

    Since there are a limited number of possible values for the state field, MongoDB may distribute data unevenly among a small number of fixed chunks. This may have a number of effects:

    • If MongoDB cannot split a chunk because all of its documents have the same shard key, migrations involving these un-splittable chunks will take longer than other migrations, and it will be more difficult for your data to stay balanced.
    • If you have a fixed maximum number of chunks, you will never be able to use more than that number of shards for this collection.
  • Consider the use of a zipcode field as a shard key:

    While this field has a large number of possible values, and thus has potentially higher cardinality, it’s possible that a large number of users could have the same value for the shard key, which would make this chunk of users un-splittable.

    In these cases, cardinality depends on the data. If your address book stores records for a geographically distributed contact list (e.g. “Dry cleaning businesses in America,”) then a value like zipcode would be sufficient. However, if your address book is more geographically concentrated (e.g “ice cream stores in Boston Massachusetts,”) then you may have a much lower cardinality.

  • Consider the use of a phone-number field as a shard key:

    Phone number has a high cardinality, because users will generally have a unique value for this field, MongoDB will be able to split as many chunks as needed.

While “high cardinality,” is necessary for ensuring an even distribution of data, having a high cardinality does not guarantee sufficient query isolation or appropriate write scaling.

Shard Key Selection Strategy

When selecting a shard key, it is difficult to balance the qualities of an ideal shard key, which sometimes dictate opposing strategies. For instance, it’s difficult to produce a key that has both a high degree randomness for even data distribution and a shard key that allows your application to target specific shards. For some workloads, it’s more important to have an even data distribution, and for others targeted queries are essential.

Therefore, the selection of a shard key is about balancing both your data and the performance characteristics caused by different possible data distributions and system workloads.