-
Type: New Feature
-
Resolution: Duplicate
-
Priority: Major - P3
-
None
-
Affects Version/s: 3.1.8
-
Component/s: Usability
-
None
-
Query
-
Fully Compatible
With $eval removed Mongo needs to have some equivalent to stored procedures or it will be unusable for many use cases. M/R and the aggregation pipelines are not alternatives as they are not designed for individual document operations but group and aggregation operations. A very common use for stored procedures is cases where changes need to be made to a document that are dependent on existing data in the collection, and it is not efficient to do that outside of the DB.
People against stored procedures typically do not want business logic separated across separate languages/locations, or dislike the additional complexity. While this may be valid in their environments, it does not negate the fact that for many use cases pulling all of the data out of the database to the client and writing it back in again with minor changes is not feasible. The additional complexity does not need to be used if their applications do not require it.
A very common use case is you need to modify every document in a collection based on some attribute of each document. An example: each document has a rank score or other numerical attr of some kind, and you want to normalize that score based on the max value in the collection. To use client side code, as you recommend in $eval removal docs; means first selecting the max, then pulling down every single document or specific fields of each in the collection, unmarshalling into native objects, reading the score and dividing by the max, and writing each result back. We have many collections with tens to hundreds of millions of documents and do many similar operations with them. To do this client side is a non-starter with the latency and bandwidth needed to ship data back and forth between the client code and server.
Running some basic tests with only ~100k documents this simple normalization operation is 10x-20x slower running client side with a co-located client than using $eval, not to mention all the additional bandwidth used for cases where client code cannot be co-located. This is in AWS with r3.8xlarge DB and SSD volumes. We re-wrote many of our applications to use $eval because of this very performance issue.
If Mongo wants to position itself as a scaleable database for analytics, it has to provide some mechanism for executing arbitrary functions with document level write support on the data within the server, and ideally one that works with shards. It doesn't need to be JS and it doesn't have to be embedded in the storage engine, but even a streaming model like Hadoop where each node executes a script on its partition of the data just using stdin/stdout would be a start. Pulling all of the data out of the database over the network to update it with some minor changes is not a strategy to scale. One of the major wins of horizontal scaling is pushing the processing to the level the data lives and have that processing power scale with the storage.
- duplicates
-
SERVER-1765 self referential updates? WAS: allow access to old row value during update
- Closed
- related to
-
SERVER-11345 Allow update to compute expressions using referenced fields like Aggregation Framework's $project
- Closed