-
Type: Improvement
-
Resolution: Won't Do
-
Priority: Major - P3
-
None
-
Affects Version/s: None
-
Component/s: Querying
-
None
-
Query Execution
Motivation. Application developers sometimes accidentally rely on the order of results returned by sort constructs such as $sort aggregation stage or sort() method in addition to what is guaranteed by the sort construct. That may happen because in some queries, circumstances, and versions of the server there could be an apparent order of the results that is consistent between queries. However, that order is not guaranteed by the server and can cause system correctness problems, such as HELP-20458, when the server configuration changes (such as when a single replica set is transformed to a sharded cluster) or the server evolves.
Proposed solution. To increase a probability of detecting this design mistake earlier in the lifecycle of the application systems, randomize the order of the results produced by sort constructs. That is, if the values of fields to sort by are equal then the relative order of such documents should be random (stochastic) and not persistent between queries.
Example
Given the collection coll:
db.coll.insert({a:1, b:1}, {a:2, b:2}, {a:1, b:3}, {a:2, b:4})
When issuing a command
db.coll.aggregate([{$sort: {a: 1}}])
ideally, a probability of getting Result A or Result B should be the same and equal to 0.25.
Result A:
{ "_id" : ObjectId("5fd0a1937a7cb559d4cff62b"), "a" : 1, "b" : 1 } { "_id" : ObjectId("5fd0a1937a7cb559d4cff62d"), "a" : 1, "b" : 3 } { "_id" : ObjectId("5fd0a1937a7cb559d4cff62c"), "a" : 2, "b" : 2 } { "_id" : ObjectId("5fd0a1937a7cb559d4cff62e"), "a" : 2, "b" : 4 }
Result B:
{ "_id" : ObjectId("5fd0a1937a7cb559d4cff62d"), "a" : 1, "b" : 3 } { "_id" : ObjectId("5fd0a1937a7cb559d4cff62b"), "a" : 1, "b" : 1 } { "_id" : ObjectId("5fd0a1937a7cb559d4cff62c"), "a" : 2, "b" : 2 } { "_id" : ObjectId("5fd0a1937a7cb559d4cff62e"), "a" : 2, "b" : 4 }
Design considerations
However, a perfect randomization (that produces any possible permutation of results with equal probability) is not required, just “random enough” to achieve the goal. Also, if necessary, trade-offs should be made to favour minimization of resource usage overhead of the feature instead of perfect randomization.
This feature could have a toggle for turning it off at the server/query level.
- duplicates
-
SERVER-51498 Sorting on duplicate values causes repeated results with limit and skip
- Closed