New in version 2.1.0.
The aggregation framework provides the ability to project, process, and/or control the output of the query, without using map-reduce. Aggregation uses a syntax that resembles the same syntax and form as “regular” MongoDB database queries.
These aggregation operations are all accessible by way of the aggregate() method. While all examples in this document use this method, aggregate() is merely a wrapper around the database command aggregate. The following prototype aggregation operations are equivalent:
db.people.aggregate( <pipeline> )
db.people.aggregate( [<pipeline>] )
db.runCommand( { aggregate: "people", pipeline: [<pipeline>] } )
These operations perform aggregation routines on the collection named people. <pipeline> is a placeholder for the aggregation pipeline definition. aggregate() accepts the stages of the pipeline (i.e. <pipeline>) as an array, or as arguments to the method.
This documentation provides an overview of all aggregation operators available for use in the aggregation pipeline as well as details regarding their use and behavior.
See also
Aggregation Framework overview, the Aggregation Framework Documentation Index, and the Aggregation Framework Examples for more information on the aggregation functionality.
Aggregation Operators:
Warning
The pipeline cannot operate on values of the following types: Binary, Symbol, MinKey, MaxKey, DBRef, Code, and CodeWScope.
Pipeline operators appear in an array. Conceptually, documents pass through these operators in a sequence. All examples in this section assume that the aggregation pipeline begins with a collection named article that contains documents that resemble the following:
{
title : "this is my title" ,
author : "bob" ,
posted : new Date() ,
pageViews : 5 ,
tags : [ "fun" , "good" , "fun" ] ,
comments : [
{ author :"joe" , text : "this is cool" } ,
{ author :"sam" , text : "this is bad" }
],
other : { foo : 5 }
}
The current pipeline operators are:
Reshapes a document stream by renaming, adding, or removing fields. Also use $project to create computed values or sub-objects. Use $project to:
Use $project to quickly select the fields that you want to include or exclude from the response. Consider the following aggregation framework operation.
db.article.aggregate(
{ $project : {
title : 1 ,
author : 1 ,
}}
);
This operation includes the title field and the author field in the document that returns from the aggregation pipeline.
Note
The _id field is always included by default. You may explicitly exclude _id as follows:
db.article.aggregate(
{ $project : {
_id : 0 ,
title : 1 ,
author : 1
}}
);
Here, the projection excludes the _id field but includes the title and author fields.
Projections can also add computed fields to the document stream passing through the pipeline. A computed field can use any of the expression operators. Consider the following example:
db.article.aggregate(
{ $project : {
title : 1,
doctoredPageViews : { $add:["$pageViews", 10] }
}}
);
Here, the field doctoredPageViews represents the value of the pageViews field after adding 10 to the original field using the $add.
Note
You must enclose the expression that defines the computed field in braces, so that the expression is a valid object.
You may also use $project to rename fields. Consider the following example:
db.article.aggregate(
{ $project : {
title : 1 ,
page_views : "$pageViews" ,
bar : "$other.foo"
}}
);
This operation renames the pageViews field to page_views, and renames the foo field in the other sub-document as the top-level field bar. The field references used for renaming fields are direct expressions and do not use an operator or surrounding braces. All aggregation field references can use dotted paths to refer to fields in nested documents.
Finally, you can use the $project to create and populate new sub-documents. Consider the following example that creates a new object-valued field named stats that holds a number of values:
db.article.aggregate(
{ $project : {
title : 1 ,
stats : {
pv : "$pageViews",
foo : "$other.foo",
dpv : { $add:["$pageViews", 10] }
}
}}
);
This projection includes the title field and places $project into “inclusive” mode. Then, it creates the stats documents with the following fields:
$match pipes the documents that match its conditions to the next operator in the pipeline.
The $match query syntax is identical to the read operation query syntax.
Example
The following operation uses $match to perform a simple equality match:
db.articles.aggregate(
{ $match : { author : "dave" } }
);
The $match selects the documents where the author field equals dave, and the aggregation returns the following:
{ "result" : [
{
"_id" : ObjectId("512bc95fe835e68f199c8686"),
"author": "dave",
"score" : 80
},
{ "_id" : ObjectId("512bc962e835e68f199c8687"),
"author" : "dave",
"score" : 85
}
],
"ok" : 1 }
Example
The following example selects documents to process using the $match pipeline operator and then pipes the results to the $group pipeline operator to compute a count of the documents:
db.articles.aggregate( [
{ $match : { score : { $gt : 70, $lte : 90 } } },
{ $group: { _id: null, count: { $sum: 1 } } }
] );
In the aggregation pipeline, $match selects the documents where the score is greater than 70 and less than or equal to 90. These documents are then piped to the $group to perform a count. The aggregation returns the following:
{
"result" : [
{
"_id" : null,
"count" : 3
}
],
"ok" : 1 }
Note
New in version 2.4: $match queries can support the geospatial $geoWithin operations.
Restricts the number of documents that pass through the $limit in the pipeline.
$limit takes a single numeric (positive whole number) value as a parameter. Once the specified number of documents pass through the pipeline operator, no more will. Consider the following example:
db.article.aggregate(
{ $limit : 5 }
);
This operation returns only the first 5 documents passed to it from by the pipeline. $limit has no effect on the content of the documents it passes.
Skips over the specified number of documents that pass through the $skip in the pipeline before passing all of the remaining input.
$skip takes a single numeric (positive whole number) value as a parameter. Once the operation has skipped the specified number of documents, it passes all the remaining documents along the pipeline without alteration. Consider the following example:
db.article.aggregate(
{ $skip : 5 }
);
This operation skips the first 5 documents passed to it by the pipeline. $skip has no effect on the content of the documents it passes along the pipeline.
Peels off the elements of an array individually, and returns a stream of documents. $unwind returns one document for every member of the unwound array within every source document. Take the following aggregation command:
db.article.aggregate(
{ $project : {
author : 1 ,
title : 1 ,
tags : 1
}},
{ $unwind : "$tags" }
);
Note
The dollar sign (i.e. $) must proceed the field specification handed to the $unwind operator.
In the above aggregation $project selects (inclusively) the author, title, and tags fields, as well as the _id field implicitly. Then the pipeline passes the results of the projection to the $unwind operator, which will unwind the tags field. This operation may return a sequence of documents that resemble the following for a collection that contains one document holding a tags field with an array of 3 items.
{
"result" : [
{
"_id" : ObjectId("4e6e4ef557b77501a49233f6"),
"title" : "this is my title",
"author" : "bob",
"tags" : "fun"
},
{
"_id" : ObjectId("4e6e4ef557b77501a49233f6"),
"title" : "this is my title",
"author" : "bob",
"tags" : "good"
},
{
"_id" : ObjectId("4e6e4ef557b77501a49233f6"),
"title" : "this is my title",
"author" : "bob",
"tags" : "fun"
}
],
"OK" : 1
}
A single document becomes 3 documents: each document is identical except for the value of the tags field. Each value of tags is one of the values in the original “tags” array.
Note
$unwind has the following behaviors:
Groups documents together for the purpose of calculating aggregate values based on a collection of documents. Practically, group often supports tasks such as average page views for each page in a website on a daily basis.
The output of $group depends on how you define groups. Begin by specifying an identifier (i.e. a _id field) for the group you’re creating with this pipeline. You can specify a single field from the documents in the pipeline, a previously computed value, or an aggregate key made up from several incoming fields. Aggregate keys may resemble the following document:
{ _id : { author: '$author', pageViews: '$pageViews', posted: '$posted' } }
With the exception of the _id field, $group cannot output nested documents.
Every group expression must specify an _id field. You may specify the _id field as a dotted field path reference, a document with multiple fields enclosed in braces (i.e. { and }), or a constant value.
Consider the following example:
db.article.aggregate(
{ $group : {
_id : "$author",
docsPerAuthor : { $sum : 1 },
viewsPerAuthor : { $sum : "$pageViews" }
}}
);
This groups by the author field and computes two fields, the first docsPerAuthor is a counter field that adds one for each document with a given author field using the $sum function. The viewsPerAuthor field is the sum of all of the pageViews fields in the documents for each group.
Each field defined for the $group must use one of the group aggregation function listed below to generate its composite value:
Warning
The aggregation system currently stores $group operations in memory, which may cause problems when processing a larger number of groups.
The $sort pipeline operator sorts all input documents and returns them to the pipeline in sorted order. Consider the following prototype form:
db.<collection-name>.aggregate(
{ $sort : { <sort-key> } }
);
This sorts the documents in the collection named <collection-name>, according to the key and specification in the { <sort-key> } document.
Specify the sort in a document with a field or fields that you want to sort by and a value of 1 or -1 to specify an ascending or descending sort respectively, as in the following example:
db.users.aggregate(
{ $sort : { age : -1, posts: 1 } }
);
This operation sorts the documents in the users collection, in descending order according by the age field and then in ascending order according to the value in the posts field.
When comparing values of different BSON types, MongoDB uses the following comparison order, from lowest to highest:
Note
MongoDB treats some types as equivalent for comparison purposes. For instance, numeric types undergo conversion before comparison.
Note
The $sort cannot begin sorting documents until previous operators in the pipeline have returned all output.
$sort operator can take advantage of an index when placed at the beginning of the pipeline or placed before the following aggregation operators:
Changed in version 2.4: When a $sort immediately precedes a $limit in the pipeline, the $sort operation only maintains the top n results as it progresses, where n is the specified limit. Before 2.4, $sort would sort all the results in memory, and then limit the results to n results.
Warning
Changed in version 2.4: Sorts immediately proceeded by a limit no longer need to fit into memory. Previously, all sorts had to fit into memory or use an index.
Unless the $sort operator can use an index, or immediately precedes a $limit, the $sort operation must fit within memory.
For $sort operations that immediately precede a $limit stage, MongoDB only needs to store the number of items specified by $limit in memory.
New in version 2.4.
$geoNear returns documents in order of nearest to farthest from a specified point and pass the documents through the aggregation pipeline.
Important
The $geoNear accept the following options:
| Fields: |
|
|---|
Example
The following aggregation finds at most 5 unique documents with a location at most .008 from the center [40.72, -73.99] and have type equal to public:
db.places.aggregate([
{
$geoNear: {
near: [40.724, -73.997],
distanceField: "dist.calculated",
maxDistance: 0.008,
query: { type: "public" },
includeLocs: "dist.location",
uniqueDocs: true,
num: 5
}
}
])
The aggregation returns the following:
{
"result" : [
{ "_id" : 7,
"name" : "Washington Square",
"type" : "public",
"location" : [
[ 40.731, -73.999 ],
[ 40.732, -73.998 ],
[ 40.730, -73.995 ],
[ 40.729, -73.996 ]
],
"dist" : {
"calculated" : 0.0050990195135962296,
"location" : [ 40.729, -73.996 ]
}
},
{ "_id" : 8,
"name" : "Sara D. Roosevelt Park",
"type" : "public",
"location" : [
[ 40.723, -73.991 ],
[ 40.723, -73.990 ],
[ 40.715, -73.994 ],
[ 40.715, -73.994 ]
],
"dist" : {
"calculated" : 0.006082762530298062,
"location" : [ 40.723, -73.991 ]
}
}
],
"ok" : 1
}
The matching documents in the result field contain two new fields:
These operators calculate values within the aggregation framework.
The $group pipeline stage provides the following operations:
Returns an array of all the values found in the selected field among the documents in that group. Every unique value only appears once in the result set. There is no ordering guarantee for the output documents.
Returns the first value it encounters for its group .
Returns the last value it encounters for its group.
Returns the highest value among all values of the field in all documents selected by this group.
The $min operator returns the lowest non-null value of a field in the documents for a $group operation.
Changed in version 2.4: If some, but not all, documents for the $min operation have either a null value for the field or are missing the field, the $min operator only considers the non-null and the non-missing values for the field. If all documents for the $min operation have null value for the field or are missing the field, the $min operator returns null for the minimum value.
Before 2.4, if any of the documents for the $min operation were missing the field, the $min operator would not return any value. If any of the documents for the $min had the value null, the $min operator would return a null.
Example
The users collection contains the following documents:
{ "_id" : "abc001", "age" : 25 }
{ "_id" : "abe001", "age" : 35 }
{ "_id" : "efg001", "age" : 20 }
{ "_id" : "xyz001", "age" : 15 }
To find the minimum value of the age field from all the documents, use the $min operator:
db.users.aggregate( [ { $group: { _id:0, minAge: { $min: "$age"} } } ] )
The operation returns the value of the age field in the minAge field:
{ "result" : [ { "_id" : 0, "minAge" : 15 } ], "ok" : 1 }
To find the minimum value of the age field for only those documents with _id starting with the letter a, use the $min operator after a $match operation:
db.users.aggregate( [ { $match: { _id: /^a/ } },
{ $group: { _id: 0, minAge: { $min: "$age"} } }
] )
The operation returns the minimum value of the age field for the two documents with _id starting with the letter a:
{ "result" : [ { "_id" : 0, "minAge" : 25 } ], "ok" : 1 }
Example
The users collection contains the following documents where some of the documents are either missing the age field or the age field contains null:
{ "_id" : "abc001", "age" : 25 }
{ "_id" : "abe001", "age" : 35 }
{ "_id" : "efg001", "age" : 20 }
{ "_id" : "xyz001", "age" : 15 }
{ "_id" : "xxx001" }
{ "_id" : "zzz001", "age" : null }
The following operation finds the minimum value of the age field in all the documents:
db.users.aggregate( [ { $group: { _id:0, minAge: { $min: "$age"} } } ] )
Because only some documents for the $min operation are missing the age field or have age field equal to null, $min only considers the non-null and the non-missing values and the operation returns the following document:
{ "result" : [ { "_id" : 0, "minAge" : 15 } ], "ok" : 1 }
The following operation finds the minimum value of the age field for only those documents where the _id equals "xxx001" or "zzz001":
db.users.aggregate( [ { $match: { _id: {$in: [ "xxx001", "zzz001" ] } } },
{ $group: { _id: 0, minAge: { $min: "$age"} } }
] )
The $min operation returns null for the minimum age since all documents for the $min operation have null value for the field age or are missing the field:
{ "result" : [ { "_id" : 0, "minAge" : null } ], "ok" : 1 }
Returns the average of all the values of the field in all documents selected by this group.
Returns an array of all the values found in the selected field among the documents in that group. A value may appear more than once in the result set if more than one field in the grouped documents has that value.
Returns the sum of all the values for a specified field in the grouped documents, as in the second use above.
Alternately, if you specify a value as an argument, $sum will increment this field by the specified value for every document in the grouping. Typically, as in the first use above, specify a value of 1 in order to count members of the group.
The three boolean operators accept Booleans as arguments and return Booleans as results.
Note
These operators convert non-booleans to Boolean values according to the BSON standards. Here, null, undefined, and 0 values become false, while non-zero numeric values, and all other types, such as strings, dates, objects become true.
Takes an array one or more values and returns true if all of the values in the array are true. Otherwise $and returns false.
Note
$and uses short-circuit logic: the operation stops evaluation after encountering the first false expression.
These operators perform comparisons between two values and return a Boolean, in most cases, reflecting the result of that comparison.
All comparison operators take an array with a pair of values. You may compare numbers, strings, and dates. Except for $cmp, all comparison operators return a Boolean value. $cmp returns an integer.
Takes two values in an array and returns an integer. The returned value is:
Takes two values in an array and returns a boolean. The returned value is:
Takes two values in an array and returns a boolean. The returned value is:
Takes two values in an array and returns a boolean. The returned value is:
Takes two values in an array and returns a boolean. The returned value is:
Takes two values in an array and returns a boolean. The returned value is:
Takes two values in an array returns a boolean. The returned value is:
These operators only support numbers.
Takes an array of one or more numbers and adds them together, returning the sum.
Takes an array that contains a pair of numbers and returns the value of the first number divided by the second number.
Takes an array that contains a pair of numbers and returns the remainder of the first number divided by the second number.
See also
Takes an array of one or more numbers and multiples them, returning the resulting product.
Takes an array that contains a pair of numbers and subtracts the second from the first, returning their difference.
These operators manipulate strings within projection expressions.
New in version 2.4.
Takes an array of strings, concatenates the strings, and returns the concatenated string. $concat can only accept an array of strings.
Use $concat with the following syntax:
{ $concat: [ <string>, <string>, ... ] }
If array element has a value of null or refers to a field that is missing, $concat will return null.
Takes in two strings. Returns a number. $strcasecmp is positive if the first string is “greater than” the second and negative if the first string is “less than” the second. $strcasecmp returns 0 if the strings are identical.
Note
$strcasecmp may not make sense when applied to glyphs outside the Roman alphabet.
$strcasecmp internally capitalizes strings before comparing them to provide a case-insensitive comparison. Use $cmp for a case sensitive comparison.
$substr takes a string and two numbers. The first number represents the number of bytes in the string to skip, and the second number specifies the number of bytes to return from the string.
Note
$substr is not encoding aware and if used improperly may produce a result string containing an invalid UTF-8 character sequence.
All date operators take a “Date” typed value as a single argument and return a number.
Takes a date and returns the day of the year as a number between 1 and 366.
Takes a date and returns the day of the month as a number between 1 and 31.
Takes a date and returns the day of the week as a number between 1 (Sunday) and 7 (Saturday.)
Takes a date and returns the full year.
Takes a date and returns the month as a number between 1 and 12.
Takes a date and returns the week of the year as a number between 0 and 53.
Weeks begin on Sundays, and week 1 begins with the first Sunday of the year. Days preceding the first Sunday of the year are in week 0. This behavior is the same as the “%U” operator to the strftime standard library function.
Takes a date and returns the hour between 0 and 23.
Takes a date and returns the minute between 0 and 59.
Takes a date and returns the second between 0 and 59, but can be 60 to account for leap seconds.
Takes a date and returns the millisecond portion of the date as an integer between 0 and 999.
Use the $cond operator with the following syntax:
{ $cond: [ <boolean-expression>, <true-case>, <false-case> ] }
Takes an array with three expressions, where the first expression evaluates to a Boolean value. If the first expression evaluates to true, $cond returns the value of the second expression. If the first expression evaluates to false, $cond evaluates and returns the third expression.