Map-reduce operations can handle complex aggregation tasks. To perform map-reduce operations, MongoDB provides the mapReduce command and, in the mongo shell, the db.collection.mapReduce() wrapper method.
For examples of map-reduce, see
For many simple aggregation tasks, see the aggregation framework.
The map-reduce operation uses a temporary collection during processing. At completion, the map-reduce operation renames the temporary collection. As a result, you can perform a map-reduce operation periodically with the same target collection name without affecting the intermediate states. Use this mode when generating statistical output collections on a regular basis.
The map-reduce operation is composed of many tasks, including:
These various tasks take the following locks:
The read phase takes a read lock. It yields every 100 documents.
The insert into the temporary collection takes a write lock for a single write.
If the output collection does not exist, the creation of the output collection takes a write lock.
If the output collection exists, then the output actions (i.e. merge, replace, reduce) take a write lock.
Changed in version 2.4: The V8 JavaScript engine, which became the default in 2.4, allows multiple JavaScript operations to execute at the same time. Prior to 2.4, JavaScript code (i.e. map, reduce, finalize functions) executed in a single thread.
Note
The final write lock during post-processing makes the results appear atomically. However, output actions merge and reduce may take minutes to process. For the merge and reduce, the nonAtomic flag is available. See the db.collection.mapReduce() reference for more information.
When using sharded collection as the input for a map-reduce operation, mongos will automatically dispatch the map-reduce job to each shard in parallel. There is no special option required. mongos will wait for jobs on all shards to finish.
By default the output collection is not sharded. The process is:
mongos dispatches a map-reduce finish job to the shard that will store the target collection.
The target shard pulls results from all other shards, and runs a final reduce/finalize operation, and write to the output.
If using the sharded option to the out parameter, MongoDB shards the output using _id field as the shard key.
Changed in version 2.2.
If the output collection does not exist, MongoDB creates and shards the collection on the _id field. If the collection is empty, MongoDB creates chunks using the result of the first stage of the map-reduce operation.
mongos dispatches, in parallel, a map-reduce finish job to every shard that owns a chunk.
Each shard will pull the results it owns from all other shards, run a final reduce/finalize, and write to the output collection.
Note
In MongoDB 2.0:
Warning
For best results, only use the sharded output options for mapReduce in version 2.2 or later.