Aggregation Framework Reference

New in version 2.1.0.

The aggregation framework provides the ability to project, process, and/or control the output of the query, without using map-reduce. Aggregation uses a syntax that resembles the same syntax and form as “regular” MongoDB database queries.

These aggregation operations are all accessible by way of the aggregate() method. While all examples in this document use this method, aggregate() is merely a wrapper around the database command aggregate. The following prototype aggregation operations are equivalent:

db.people.aggregate( <pipeline> )
db.people.aggregate( [<pipeline>] )
db.runCommand( { aggregate: "people", pipeline: [<pipeline>] } )

These operations perform aggregation routines on the collection named people. <pipeline> is a placeholder for the aggregation pipeline definition. aggregate() accepts the stages of the pipeline (i.e. <pipeline>) as an array, or as arguments to the method.

This documentation provides an overview of all aggregation operators available for use in the aggregation pipeline as well as details regarding their use and behavior.

See also

Aggregation Framework overview, the Aggregation Framework Documentation Index, and the Aggregation Framework Examples for more information on the aggregation functionality.

Pipeline

Warning

The pipeline cannot operate on values of the following types: Binary, Symbol, MinKey, MaxKey, DBRef, Code, and CodeWScope.

Pipeline operators appear in an array. Conceptually, documents pass through these operators in a sequence. All examples in this section assume that the aggregation pipeline begins with a collection named article that contains documents that resemble the following:

{
  title : "this is my title" ,
  author : "bob" ,
  posted : new Date() ,
  pageViews : 5 ,
  tags : [ "fun" , "good" , "fun" ] ,
  comments : [
      { author :"joe" , text : "this is cool" } ,
      { author :"sam" , text : "this is bad" }
  ],
  other : { foo : 5 }
}

The current pipeline operators are:

$project

Reshapes a document stream by renaming, adding, or removing fields. Also use $project to create computed values or sub-objects. Use $project to:

  • Include fields from the original document.
  • Insert computed fields.
  • Rename fields.
  • Create and populate fields that hold sub-documents.

Use $project to quickly select the fields that you want to include or exclude from the response. Consider the following aggregation framework operation.

db.article.aggregate(
    { $project : {
        title : 1 ,
        author : 1 ,
    }}
 );

This operation includes the title field and the author field in the document that returns from the aggregation pipeline.

Note

The _id field is always included by default. You may explicitly exclude _id as follows:

db.article.aggregate(
    { $project : {
        _id : 0 ,
        title : 1 ,
        author : 1
    }}
);

Here, the projection excludes the _id field but includes the title and author fields.

Projections can also add computed fields to the document stream passing through the pipeline. A computed field can use any of the expression operators. Consider the following example:

db.article.aggregate(
    { $project : {
        title : 1,
        doctoredPageViews : { $add:["$pageViews", 10] }
    }}
);

Here, the field doctoredPageViews represents the value of the pageViews field after adding 10 to the original field using the $add.

Note

You must enclose the expression that defines the computed field in braces, so that the expression is a valid object.

You may also use $project to rename fields. Consider the following example:

db.article.aggregate(
    { $project : {
        title : 1 ,
        page_views : "$pageViews" ,
        bar : "$other.foo"
    }}
);

This operation renames the pageViews field to page_views, and renames the foo field in the other sub-document as the top-level field bar. The field references used for renaming fields are direct expressions and do not use an operator or surrounding braces. All aggregation field references can use dotted paths to refer to fields in nested documents.

Finally, you can use the $project to create and populate new sub-documents. Consider the following example that creates a new object-valued field named stats that holds a number of values:

db.article.aggregate(
    { $project : {
        title : 1 ,
        stats : {
            pv : "$pageViews",
            foo : "$other.foo",
            dpv : { $add:["$pageViews", 10] }
        }
    }}
);

This projection includes the title field and places $project into “inclusive” mode. Then, it creates the stats documents with the following fields:

  • pv which includes and renames the pageViews from the top level of the original documents.
  • foo which includes the value of other.foo from the original documents.
  • dpv which is a computed field that adds 10 to the value of the pageViews field in the original document using the $add aggregation expression.
$match

$match pipes the documents that match its conditions to the next operator in the pipeline.

The $match query syntax is identical to the read operation query syntax.

Example

The following operation uses $match to perform a simple equality match:

db.articles.aggregate(
    { $match : { author : "dave" } }
);

The $match selects the documents where the author field equals dave, and the aggregation returns the following:

{ "result" : [
               {
                 "_id" : ObjectId("512bc95fe835e68f199c8686"),
                 "author": "dave",
                 "score" : 80
               },
               { "_id" : ObjectId("512bc962e835e68f199c8687"),
                 "author" : "dave",
                 "score" : 85
               }
             ],
  "ok" : 1 }

Example

The following example selects documents to process using the $match pipeline operator and then pipes the results to the $group pipeline operator to compute a count of the documents:

db.articles.aggregate( [
                        { $match : { score : { $gt : 70, $lte : 90 } } },
                        { $group: { _id: null, count: { $sum: 1 } } }
                       ] );

In the aggregation pipeline, $match selects the documents where the score is greater than 70 and less than or equal to 90. These documents are then piped to the $group to perform a count. The aggregation returns the following:

{
  "result" : [
               {
                 "_id" : null,
                 "count" : 3
               }
             ],
  "ok" : 1 }

Note

New in version 2.4: $match queries can support the geospatial $geoWithin operations.

Warning

You cannot use $where in $match queries as part of the aggregation pipeline.

$limit

Restricts the number of documents that pass through the $limit in the pipeline.

$limit takes a single numeric (positive whole number) value as a parameter. Once the specified number of documents pass through the pipeline operator, no more will. Consider the following example:

db.article.aggregate(
    { $limit : 5 }
);

This operation returns only the first 5 documents passed to it from by the pipeline. $limit has no effect on the content of the documents it passes.

Note

Changed in version 2.4: When a $sort immediately precedes a $limit in the pipeline, the $sort operation only maintains the top n results as it progresses, where n is the specified limit. Before 2.4, $sort would sort all the results in memory, and then limit the results to n results.

$skip

Skips over the specified number of documents that pass through the $skip in the pipeline before passing all of the remaining input.

$skip takes a single numeric (positive whole number) value as a parameter. Once the operation has skipped the specified number of documents, it passes all the remaining documents along the pipeline without alteration. Consider the following example:

db.article.aggregate(
    { $skip : 5 }
);

This operation skips the first 5 documents passed to it by the pipeline. $skip has no effect on the content of the documents it passes along the pipeline.

$unwind

Peels off the elements of an array individually, and returns a stream of documents. $unwind returns one document for every member of the unwound array within every source document. Take the following aggregation command:

db.article.aggregate(
    { $project : {
        author : 1 ,
        title : 1 ,
        tags : 1
    }},
    { $unwind : "$tags" }
);

Note

The dollar sign (i.e. $) must proceed the field specification handed to the $unwind operator.

In the above aggregation $project selects (inclusively) the author, title, and tags fields, as well as the _id field implicitly. Then the pipeline passes the results of the projection to the $unwind operator, which will unwind the tags field. This operation may return a sequence of documents that resemble the following for a collection that contains one document holding a tags field with an array of 3 items.

{
     "result" : [
             {
                     "_id" : ObjectId("4e6e4ef557b77501a49233f6"),
                     "title" : "this is my title",
                     "author" : "bob",
                     "tags" : "fun"
             },
             {
                     "_id" : ObjectId("4e6e4ef557b77501a49233f6"),
                     "title" : "this is my title",
                     "author" : "bob",
                     "tags" : "good"
             },
             {
                     "_id" : ObjectId("4e6e4ef557b77501a49233f6"),
                     "title" : "this is my title",
                     "author" : "bob",
                     "tags" : "fun"
             }
     ],
     "OK" : 1
}

A single document becomes 3 documents: each document is identical except for the value of the tags field. Each value of tags is one of the values in the original “tags” array.

Note

$unwind has the following behaviors:

  • $unwind is most useful in combination with $group.
  • You may undo the effects of unwind operation with the $group pipeline operator.
  • If you specify a target field for $unwind that does not exist in an input document, the pipeline ignores the input document, and will generate no result documents.
  • If you specify a target field for $unwind that is not an array, db.collection.aggregate() generates an error.
  • If you specify a target field for $unwind that holds an empty array ([]) in an input document, the pipeline ignores the input document, and will generates no result documents.
$group

Groups documents together for the purpose of calculating aggregate values based on a collection of documents. Practically, group often supports tasks such as average page views for each page in a website on a daily basis.

The output of $group depends on how you define groups. Begin by specifying an identifier (i.e. a _id field) for the group you’re creating with this pipeline. You can specify a single field from the documents in the pipeline, a previously computed value, or an aggregate key made up from several incoming fields. Aggregate keys may resemble the following document:

{ _id : { author: '$author', pageViews: '$pageViews', posted: '$posted' } }

With the exception of the _id field, $group cannot output nested documents.

Every group expression must specify an _id field. You may specify the _id field as a dotted field path reference, a document with multiple fields enclosed in braces (i.e. { and }), or a constant value.

Note

Use $project as needed to rename the grouped field after an $group operation, if necessary.

Consider the following example:

db.article.aggregate(
    { $group : {
        _id : "$author",
        docsPerAuthor : { $sum : 1 },
        viewsPerAuthor : { $sum : "$pageViews" }
    }}
);

This groups by the author field and computes two fields, the first docsPerAuthor is a counter field that adds one for each document with a given author field using the $sum function. The viewsPerAuthor field is the sum of all of the pageViews fields in the documents for each group.

Each field defined for the $group must use one of the group aggregation function listed below to generate its composite value:

Warning

The aggregation system currently stores $group operations in memory, which may cause problems when processing a larger number of groups.

$sort

The $sort pipeline operator sorts all input documents and returns them to the pipeline in sorted order. Consider the following prototype form:

db.<collection-name>.aggregate(
    { $sort : { <sort-key> } }
);

This sorts the documents in the collection named <collection-name>, according to the key and specification in the { <sort-key> } document.

Specify the sort in a document with a field or fields that you want to sort by and a value of 1 or -1 to specify an ascending or descending sort respectively, as in the following example:

db.users.aggregate(
    { $sort : { age : -1, posts: 1 } }
);

This operation sorts the documents in the users collection, in descending order according by the age field and then in ascending order according to the value in the posts field.

When comparing values of different BSON types, MongoDB uses the following comparison order, from lowest to highest:

  1. MinKey (internal type)
  2. Null
  3. Numbers (ints, longs, doubles)
  4. Symbol, String
  5. Object
  6. Array
  7. BinData
  8. ObjectID
  9. Boolean
  10. Date, Timestamp
  11. Regular Expression
  12. MaxKey (internal type)

Note

MongoDB treats some types as equivalent for comparison purposes. For instance, numeric types undergo conversion before comparison.

Note

The $sort cannot begin sorting documents until previous operators in the pipeline have returned all output.

$sort operator can take advantage of an index when placed at the beginning of the pipeline or placed before the following aggregation operators:

Changed in version 2.4: When a $sort immediately precedes a $limit in the pipeline, the $sort operation only maintains the top n results as it progresses, where n is the specified limit. Before 2.4, $sort would sort all the results in memory, and then limit the results to n results.

Warning

Changed in version 2.4: Sorts immediately proceeded by a limit no longer need to fit into memory. Previously, all sorts had to fit into memory or use an index.

Unless the $sort operator can use an index, or immediately precedes a $limit, the $sort operation must fit within memory.

For $sort operations that immediately precede a $limit stage, MongoDB only needs to store the number of items specified by $limit in memory.

$geoNear

New in version 2.4.

$geoNear returns documents in order of nearest to farthest from a specified point and pass the documents through the aggregation pipeline.

Important

  • You can only use $geoNear as the first stage of a pipeline.
  • You must include the distanceField option. The distanceField option specifies the field that will contain the calculated distance.
  • The collection must have a geospatial index.

The $geoNear accept the following options:

Fields:
  • near (coordinates) – Specifies the coordinates (e.g. [ x, y ]) to use as the center of a geospatial query.
  • distanceField (string) – Specifies the output field that will contain the calculated distance. You can use the dot notation to specify a field within a subdocument.
  • limit (number) – Optional. Specifies the maximum number of documents to return. The default value is 100. See also the num option.
  • num (number) – Optional. Synonym for the limit option. If both num and limit are included, the num value overrides the limit value.
  • maxDistance (number) – Optional. Limits the results to the documents within the specified distance from the center coordinates.
  • query (document) – Optional. Limits the results to the documents that match the query. The query syntax is identical to the read operation query syntax.
  • spherical (boolean) – Optional. Default value is false. When true, MongoDB performs calculation using spherical geometry.
  • distanceMultiplier (number) – Optional. Specifies a factor to multiply all distances returned by $geoNear. For example, use distanceMultiplier to convert from spherical queries returned in radians to linear units (i.e. miles or kilometers) by multiplying by the radius of the Earth.
  • includeLocs (string) – Optional. Specifies the output field that identifies the location used to calculate the distance. This option is useful when a location field contains multiple locations. You can use the dot notation to specify a field within a subdocument.
  • uniqueDocs (boolean) –

    Optional. Default value is false. If a location field contains multiple locations, the default settings will return the document multiple times if more than one location meets the criteria.

    When true, the document will only return once even if the document has multiple locations that meet the criteria.

Example

The following aggregation finds at most 5 unique documents with a location at most .008 from the center [40.72, -73.99] and have type equal to public:

db.places.aggregate([
                      {
                        $geoNear: {
                                    near: [40.724, -73.997],
                                    distanceField: "dist.calculated",
                                    maxDistance: 0.008,
                                    query: { type: "public" },
                                    includeLocs: "dist.location",
                                    uniqueDocs: true,
                                    num: 5
                                  }
                      }
                   ])

The aggregation returns the following:

{
  "result" : [
               { "_id" : 7,
                 "name" : "Washington Square",
                 "type" : "public",
                 "location" : [
                                [ 40.731, -73.999 ],
                                [ 40.732, -73.998 ],
                                [ 40.730, -73.995 ],
                                [ 40.729, -73.996 ]
                              ],
                 "dist" : {
                            "calculated" : 0.0050990195135962296,
                            "location" : [ 40.729, -73.996 ]
                          }
               },
               { "_id" : 8,
                 "name" : "Sara D. Roosevelt Park",
                 "type" : "public",
                 "location" : [
                                [ 40.723, -73.991 ],
                                [ 40.723, -73.990 ],
                                [ 40.715, -73.994 ],
                                [ 40.715, -73.994 ]
                              ],
                 "dist" : {
                            "calculated" : 0.006082762530298062,
                            "location" : [ 40.723, -73.991 ]
                          }
               }
             ],
  "ok" : 1
}

The matching documents in the result field contain two new fields:

  • dist.calculated field that contains the calculated distance, and
  • dist.location field that contains the location used in the calculation.

Note

The options for $geoNear are similar to the geoNear command with the following exceptions:

  • distanceField is a mandatory field for the $geoNear pipeline operator; the option does not exist in the geoNear command.
  • includeLocs accepts a string in the $geoNear pipeline operator and a boolean in the geoNear command.

Expressions

These operators calculate values within the aggregation framework.

$group Operators

The $group pipeline stage provides the following operations:

$addToSet

Returns an array of all the values found in the selected field among the documents in that group. Every unique value only appears once in the result set. There is no ordering guarantee for the output documents.

$first

Returns the first value it encounters for its group .

Note

Only use $first when the $group follows an $sort operation. Otherwise, the result of this operation is unpredictable.

$last

Returns the last value it encounters for its group.

Note

Only use $last when the $group follows an $sort operation. Otherwise, the result of this operation is unpredictable.

$max

Returns the highest value among all values of the field in all documents selected by this group.

$min

The $min operator returns the lowest non-null value of a field in the documents for a $group operation.

Changed in version 2.4: If some, but not all, documents for the $min operation have either a null value for the field or are missing the field, the $min operator only considers the non-null and the non-missing values for the field. If all documents for the $min operation have null value for the field or are missing the field, the $min operator returns null for the minimum value.

Before 2.4, if any of the documents for the $min operation were missing the field, the $min operator would not return any value. If any of the documents for the $min had the value null, the $min operator would return a null.

Example

The users collection contains the following documents:

{ "_id" : "abc001", "age" : 25 }
{ "_id" : "abe001", "age" : 35 }
{ "_id" : "efg001", "age" : 20 }
{ "_id" : "xyz001", "age" : 15 }
  • To find the minimum value of the age field from all the documents, use the $min operator:

    db.users.aggregate( [ { $group: { _id:0, minAge: { $min: "$age"} } } ] )
    

    The operation returns the value of the age field in the minAge field:

    { "result" : [ { "_id" : 0, "minAge" : 15 } ], "ok" : 1 }
    
  • To find the minimum value of the age field for only those documents with _id starting with the letter a, use the $min operator after a $match operation:

    db.users.aggregate( [ { $match: { _id: /^a/ } },
                          { $group: { _id: 0, minAge: { $min: "$age"} } }
                        ] )
    

    The operation returns the minimum value of the age field for the two documents with _id starting with the letter a:

    { "result" : [ { "_id" : 0, "minAge" : 25 } ], "ok" : 1 }
    

Example

The users collection contains the following documents where some of the documents are either missing the age field or the age field contains null:

{ "_id" : "abc001", "age" : 25 }
{ "_id" : "abe001", "age" : 35 }
{ "_id" : "efg001", "age" : 20 }
{ "_id" : "xyz001", "age" : 15 }
{ "_id" : "xxx001" }
{ "_id" : "zzz001", "age" : null }
  • The following operation finds the minimum value of the age field in all the documents:

    db.users.aggregate( [ { $group: { _id:0, minAge: { $min: "$age"} } } ] )
    

    Because only some documents for the $min operation are missing the age field or have age field equal to null, $min only considers the non-null and the non-missing values and the operation returns the following document:

    { "result" : [ { "_id" : 0, "minAge" : 15 } ], "ok" : 1 }
    
  • The following operation finds the minimum value of the age field for only those documents where the _id equals "xxx001" or "zzz001":

    db.users.aggregate( [ { $match: { _id: {$in: [ "xxx001", "zzz001" ] } } },
                          { $group: { _id: 0, minAge: { $min: "$age"} } }
                        ] )
    

    The $min operation returns null for the minimum age since all documents for the $min operation have null value for the field age or are missing the field:

    { "result" : [ { "_id" : 0, "minAge" : null } ], "ok" : 1 }
    
$avg

Returns the average of all the values of the field in all documents selected by this group.

$push

Returns an array of all the values found in the selected field among the documents in that group. A value may appear more than once in the result set if more than one field in the grouped documents has that value.

$sum

Returns the sum of all the values for a specified field in the grouped documents, as in the second use above.

Alternately, if you specify a value as an argument, $sum will increment this field by the specified value for every document in the grouping. Typically, as in the first use above, specify a value of 1 in order to count members of the group.

Boolean Operators

The three boolean operators accept Booleans as arguments and return Booleans as results.

Note

These operators convert non-booleans to Boolean values according to the BSON standards. Here, null, undefined, and 0 values become false, while non-zero numeric values, and all other types, such as strings, dates, objects become true.

$and

Takes an array one or more values and returns true if all of the values in the array are true. Otherwise $and returns false.

Note

$and uses short-circuit logic: the operation stops evaluation after encountering the first false expression.

$or

Takes an array of one or more values and returns true if any of the values in the array are true. Otherwise $or returns false.

Note

$or uses short-circuit logic: the operation stops evaluation after encountering the first true expression.

$not

Returns the boolean opposite value passed to it. When passed a true value, $not returns false; when passed a false value, $not returns true.

Comparison Operators

These operators perform comparisons between two values and return a Boolean, in most cases, reflecting the result of that comparison.

All comparison operators take an array with a pair of values. You may compare numbers, strings, and dates. Except for $cmp, all comparison operators return a Boolean value. $cmp returns an integer.

$cmp

Takes two values in an array and returns an integer. The returned value is:

  • A negative number if the first value is less than the second.
  • A positive number if the first value is greater than the second.
  • 0 if the two values are equal.
$eq

Takes two values in an array and returns a boolean. The returned value is:

  • true when the values are equivalent.
  • false when the values are not equivalent.
$gt

Takes two values in an array and returns a boolean. The returned value is:

  • true when the first value is greater than the second value.
  • false when the first value is less than or equal to the second value.
$gte

Takes two values in an array and returns a boolean. The returned value is:

  • true when the first value is greater than or equal to the second value.
  • false when the first value is less than the second value.
$lt

Takes two values in an array and returns a boolean. The returned value is:

  • true when the first value is less than the second value.
  • false when the first value is greater than or equal to the second value.
$lte

Takes two values in an array and returns a boolean. The returned value is:

  • true when the first value is less than or equal to the second value.
  • false when the first value is greater than the second value.
$ne

Takes two values in an array returns a boolean. The returned value is:

  • true when the values are not equivalent.
  • false when the values are equivalent.

Arithmetic Operators

These operators only support numbers.

$add

Takes an array of one or more numbers and adds them together, returning the sum.

$divide

Takes an array that contains a pair of numbers and returns the value of the first number divided by the second number.

$mod

Takes an array that contains a pair of numbers and returns the remainder of the first number divided by the second number.

See also

$mod

$multiply

Takes an array of one or more numbers and multiples them, returning the resulting product.

$subtract

Takes an array that contains a pair of numbers and subtracts the second from the first, returning their difference.

String Operators

These operators manipulate strings within projection expressions.

$concat

New in version 2.4.

Takes an array of strings, concatenates the strings, and returns the concatenated string. $concat can only accept an array of strings.

Use $concat with the following syntax:

{ $concat: [ <string>, <string>, ... ] }

If array element has a value of null or refers to a field that is missing, $concat will return null.

$strcasecmp

Takes in two strings. Returns a number. $strcasecmp is positive if the first string is “greater than” the second and negative if the first string is “less than” the second. $strcasecmp returns 0 if the strings are identical.

Note

$strcasecmp may not make sense when applied to glyphs outside the Roman alphabet.

$strcasecmp internally capitalizes strings before comparing them to provide a case-insensitive comparison. Use $cmp for a case sensitive comparison.

$substr

$substr takes a string and two numbers. The first number represents the number of bytes in the string to skip, and the second number specifies the number of bytes to return from the string.

Note

$substr is not encoding aware and if used improperly may produce a result string containing an invalid UTF-8 character sequence.

$toLower

Takes a single string and converts that string to lowercase, returning the result. All uppercase letters become lowercase.

Note

$toLower may not make sense when applied to glyphs outside the Roman alphabet.

$toUpper

Takes a single string and converts that string to uppercase, returning the result. All lowercase letters become uppercase.

Note

$toUpper may not make sense when applied to glyphs outside the Roman alphabet.

Date Operators

All date operators take a “Date” typed value as a single argument and return a number.

$dayOfYear

Takes a date and returns the day of the year as a number between 1 and 366.

$dayOfMonth

Takes a date and returns the day of the month as a number between 1 and 31.

$dayOfWeek

Takes a date and returns the day of the week as a number between 1 (Sunday) and 7 (Saturday.)

$year

Takes a date and returns the full year.

$month

Takes a date and returns the month as a number between 1 and 12.

$week

Takes a date and returns the week of the year as a number between 0 and 53.

Weeks begin on Sundays, and week 1 begins with the first Sunday of the year. Days preceding the first Sunday of the year are in week 0. This behavior is the same as the “%U” operator to the strftime standard library function.

$hour

Takes a date and returns the hour between 0 and 23.

$minute

Takes a date and returns the minute between 0 and 59.

$second

Takes a date and returns the second between 0 and 59, but can be 60 to account for leap seconds.

$millisecond

Takes a date and returns the millisecond portion of the date as an integer between 0 and 999.

Conditional Expressions

$cond

Use the $cond operator with the following syntax:

{ $cond: [ <boolean-expression>, <true-case>, <false-case> ] }

Takes an array with three expressions, where the first expression evaluates to a Boolean value. If the first expression evaluates to true, $cond returns the value of the second expression. If the first expression evaluates to false, $cond evaluates and returns the third expression.

$ifNull

Use the $ifNull operator with the following syntax:

{ $ifNull: [ <expression>, <replacement-if-null> ] }

Takes an array with two expressions. $ifNull returns the first expression if it evaluates to a non-null value. Otherwise, $ifNull returns the second expression’s value.