Aggregation Pipeline and Sharded Collections

The aggregation pipeline supports operations on sharded collections. This section describes behaviors specific to the aggregation pipeline and sharded collections.


Changed in version 2.6.

When operating on a sharded collection, the aggregation pipeline is split into two parts. The first pipeline runs on each shard, or if an early $match can exclude shards through the use of the shard key in the predicate, the pipeline runs on only the relevant shards.

The second pipeline consists of the remaining pipeline stages and runs on the primary shard. The primary shard merges the cursors from the other shards and runs the second pipeline on these results. The primary shard forwards the final results to the mongos. In previous versions, the second pipeline would run on the mongos. [1]


When splitting the aggregation pipeline into two parts, the pipeline is split to ensure that the shards perform as many stages as possible with consideration for optimization.

To see how the pipeline was split, include the explain option in the db.collection.aggregate() method.

Optimizations are subject to change between releases.

[1]Until all shards upgrade to v2.6, the second pipeline runs on the mongos if any shards are still running v2.4.