- Sharding >
- Sharded Cluster Tutorials >
- Sharded Cluster Data Management >
- Enforce Unique Keys for Sharded Collections
Enforce Unique Keys for Sharded Collections¶
On this page
Overview¶
The unique
constraint on
indexes ensures that only one document can have a value for a field in
a collection. For sharded collections these unique indexes
cannot enforce uniqueness because
insert and indexing operations are local to each shard.
MongoDB does not support creating new unique indexes in sharded clusters
and will not allow you to shard collections with unique indexes on
fields other than the _id
field.
If you need to ensure that a field is always unique in all collections in a sharded environment, there are three options:
Enforce uniqueness of the shard key.
MongoDB can enforce uniqueness for the shard key. For compound shard keys, MongoDB will enforce uniqueness on the entire key combination, and not for a specific component of the shard key.
You cannot specify a unique constraint on a hashed index.
Use a secondary collection to enforce uniqueness.
Create a minimal collection that only contains the unique field and a reference to a document in the main collection. If you always insert into a secondary collection before inserting to the main collection, MongoDB will produce an error if you attempt to use a duplicate key.
If you have a small data set, you may not need to shard this collection and you can create multiple unique indexes. Otherwise you can shard on a single unique key.
Use guaranteed unique identifiers.
Universally unique identifiers (i.e. UUID) like the
ObjectId
are guaranteed to be unique.
Procedures¶
Unique Constraints on the Shard Key¶
Process¶
To shard a collection
using the unique
constraint, specify the shardCollection
command in the following form:
Remember that the _id
field index is always unique. By default, MongoDB
inserts an ObjectId
into the _id
field. However,
you can manually insert your own value into the _id
field and
use this as the shard key. To use the
_id
field as the shard key, use the following operation:
Limitations¶
- You can only enforce uniqueness on one single field in the collection using this method.
- If you use a compound shard key, you can only enforce uniqueness on the combination of component keys in the shard key.
In most cases, the best shard keys are compound keys that include elements that permit write scaling and query isolation, as well as high cardinality. These ideal shard keys are not often the same keys that require uniqueness and enforcing unique values in these collections requires a different approach.
Unique Constraints on Arbitrary Fields¶
If you cannot use a unique field as the shard key or if you need to
enforce uniqueness over multiple fields, you must create another
collection to act as a “proxy collection”. This collection
must contain both a reference to the original document (i.e. its
ObjectId
) and the unique key.
If you must shard this “proxy” collection, then shard on the unique key using the above procedure; otherwise, you can simply create multiple unique indexes on the collection.
Process¶
Consider the following for the “proxy collection:”
The _id
field holds the ObjectId
of the document
it reflects, and the email
field is the field on which you want to
ensure uniqueness.
To shard this collection, use the following operation
using the email
field as the shard key:
If you do not need to shard the proxy collection, use the following
command to create a unique index on the email
field:
You may create multiple unique indexes on this collection if you do
not plan to shard the proxy
collection.
To insert documents, use the following procedure in the JavaScript shell:
You must insert a document into the proxy
collection first. If
this operation succeeds, the email
field is unique, and you may
continue by inserting the actual document into the information
collection.
See
The full documentation of: ensureIndex()
and shardCollection
.
Considerations¶
- Your application must catch errors when inserting documents into the “proxy” collection and must enforce consistency between the two collections.
- If the proxy collection requires sharding, you must shard on the single field on which you want to enforce uniqueness.
- To enforce uniqueness on more than one field using sharded proxy collections, you must have one proxy collection for every field for which to enforce uniqueness. If you create multiple unique indexes on a single proxy collection, you will not be able to shard proxy collections.
Use Guaranteed Unique Identifier¶
The best way to ensure a field has unique values is to generate
universally unique identifiers (UUID,) such as MongoDB’s ‘ObjectId
values.
This approach is particularly useful for the``_id`` field, which
must be unique: for collections where you are not sharding by the
_id
field the application is responsible for ensuring that the
_id
field is unique.