OPTIONS

Enforce Unique Keys for Sharded Collections

Overview

The unique constraint on indexes ensures that only one document can have a value for a field in a collection. For sharded collections these unique indexes cannot enforce uniqueness because insert and indexing operations are local to each shard.

MongoDB does not support creating new unique indexes in sharded clusters and will not allow you to shard collections with unique indexes on fields other than the _id field.

If you need to ensure that a field is always unique in all collections in a sharded environment, there are three options:

  1. Enforce uniqueness of the shard key.

    MongoDB can enforce uniqueness for the shard key. For compound shard keys, MongoDB will enforce uniqueness on the entire key combination, and not for a specific component of the shard key.

    You cannot specify a unique constraint on a hashed index.

  2. Use a secondary collection to enforce uniqueness.

    Create a minimal collection that only contains the unique field and a reference to a document in the main collection. If you always insert into a secondary collection before inserting to the main collection, MongoDB will produce an error if you attempt to use a duplicate key.

    If you have a small data set, you may not need to shard this collection and you can create multiple unique indexes. Otherwise you can shard on a single unique key.

  3. Use guaranteed unique identifiers.

    Universally unique identifiers (i.e. UUID) like the ObjectId are guaranteed to be unique.

Procedures

Unique Constraints on the Shard Key

Process

To shard a collection using the unique constraint, specify the shardCollection command in the following form:

db.runCommand( { shardCollection : "test.users" , key : { email : 1 } , unique : true } );

Remember that the _id field index is always unique. By default, MongoDB inserts an ObjectId into the _id field. However, you can manually insert your own value into the _id field and use this as the shard key. To use the _id field as the shard key, use the following operation:

db.runCommand( { shardCollection : "test.users" } )

Limitations

  • You can only enforce uniqueness on one single field in the collection using this method.
  • If you use a compound shard key, you can only enforce uniqueness on the combination of component keys in the shard key.

In most cases, the best shard keys are compound keys that include elements that permit write scaling and query isolation, as well as high cardinality. These ideal shard keys are not often the same keys that require uniqueness and enforcing unique values in these collections requires a different approach.

Unique Constraints on Arbitrary Fields

If you cannot use a unique field as the shard key or if you need to enforce uniqueness over multiple fields, you must create another collection to act as a “proxy collection”. This collection must contain both a reference to the original document (i.e. its ObjectId) and the unique key.

If you must shard this “proxy” collection, then shard on the unique key using the above procedure; otherwise, you can simply create multiple unique indexes on the collection.

Process

Consider the following for the “proxy collection:”

{
  "_id" : ObjectId("...")
  "email" ": "..."
}

The _id field holds the ObjectId of the document it reflects, and the email field is the field on which you want to ensure uniqueness.

To shard this collection, use the following operation using the email field as the shard key:

db.runCommand( { shardCollection : "records.proxy" ,
                 key : { email : 1 } ,
                 unique : true } );

If you do not need to shard the proxy collection, use the following command to create a unique index on the email field:

db.proxy.ensureIndex( { "email" : 1 }, { unique : true } )

You may create multiple unique indexes on this collection if you do not plan to shard the proxy collection.

To insert documents, use the following procedure in the JavaScript shell:

db = db.getSiblingDB('records');

var primary_id = ObjectId();

db.proxy.insert({
   "_id" : primary_id
   "email" : "example@example.net"
})

// if: the above operation returns successfully,
// then continue:

db.information.insert({
   "_id" : primary_id
   "email": "example@example.net"
   // additional information...
})

You must insert a document into the proxy collection first. If this operation succeeds, the email field is unique, and you may continue by inserting the actual document into the information collection.

See

The full documentation of: ensureIndex() and shardCollection.

Considerations

  • Your application must catch errors when inserting documents into the “proxy” collection and must enforce consistency between the two collections.
  • If the proxy collection requires sharding, you must shard on the single field on which you want to enforce uniqueness.
  • To enforce uniqueness on more than one field using sharded proxy collections, you must have one proxy collection for every field for which to enforce uniqueness. If you create multiple unique indexes on a single proxy collection, you will not be able to shard proxy collections.

Use Guaranteed Unique Identifier

The best way to ensure a field has unique values is to generate universally unique identifiers (UUID,) such as MongoDB’s ‘ObjectId values.

This approach is particularly useful for the``_id`` field, which must be unique: for collections where you are not sharding by the _id field the application is responsible for ensuring that the _id field is unique.