- Aggregation >
- Map-Reduce >
- Perform Incremental Map-Reduce
Perform Incremental Map-Reduce¶
Map-reduce operations can handle complex aggregation tasks. To perform
map-reduce operations, MongoDB provides the mapReduce
command and, in the mongo
shell, the
db.collection.mapReduce()
wrapper method.
If the map-reduce data set is constantly growing, you may want to perform an incremental map-reduce rather than performing the map-reduce operation over the entire data set each time.
To perform incremental map-reduce:
- Run a map-reduce job over the current collection and output the result to a separate collection.
- When you have more data to process, run subsequent map-reduce job
with:
- the
query
parameter that specifies conditions that match only the new documents. - the
out
parameter that specifies thereduce
action to merge the new results into the existing output collection.
- the
Consider the following example where you schedule a map-reduce
operation on a sessions
collection to run at the end of each day.
Data Setup¶
The sessions
collection contains documents that log users’ sessions
each day, for example:
Initial Map-Reduce of Current Collection¶
Run the first map-reduce operation as follows:
Define the map function that maps the
userid
to an object that contains the fieldsuserid
,total_time
,count
, andavg_time
:Define the corresponding reduce function with two arguments
key
andvalues
to calculate the total time and the count. Thekey
corresponds to theuserid
, and thevalues
is an array whose elements corresponds to the individual objects mapped to theuserid
in themapFunction
.Define the finalize function with two arguments
key
andreducedValue
. The function modifies thereducedValue
document to add another fieldaverage
and returns the modified document.Perform map-reduce on the
session
collection using themapFunction
, thereduceFunction
, and thefinalizeFunction
functions. Output the results to a collectionsession_stat
. If thesession_stat
collection already exists, the operation will replace the contents:
Subsequent Incremental Map-Reduce¶
Later, as the sessions
collection grows, you can run additional
map-reduce operations. For example, add new documents to the
sessions
collection:
At the end of the day, perform incremental map-reduce on the
sessions
collection, but use the query
field to select only the
new documents. Output the results to the collection session_stat
,
but reduce
the contents with the results of the incremental
map-reduce: