-
Type: Improvement
-
Resolution: Done
-
Priority: Major - P3
-
None
-
Affects Version/s: None
-
Component/s: MapReduce
-
None
-
Query Execution
The MapReduce command from MongoDB takes two non-optional functions, "map" and "reduce", and an optional "finalize" function. "reduce" is supposed to output the same data format from the "map" function.
In some other frameworks, the functions are "map", "shuffle" and "reduce". "shuffle" is the one supposed to output the same data format from "map", just like the "reduce" from mongoDB, but it is "shuffle" that is the optional function, and the non-optional "reduce" is more like the "finalize" from MongoDB. "shuffle" is also known as "local reduce".
It would be great if MongoDB could work like this instead, with the different nomenclature and optional parameters. Maybe changing the mapReduce method, or maybe creating a new method...
Another interesting modification is to always deliver the data to the final step ("finalize"/"reduce") inside a list, even if there is just one item. This way we can always assume there is a list to process, and the method becomes simpler to write.
It should also be easy to have an "identity reducer", it could be the default when no reducer is specified.
Related tickets:
- related to
-
SERVER-5818 reduce in map reduce doesn't run with only one input document
- Closed
-
SERVER-2333 mapreduce optimization: do not execute reduce on unique keys
- Closed