Old behavior (2.4.x)
We have the following collection of 4 documents and no indices:
> db.t.find() { "_id" : 1, "a" : 1 } { "_id" : 2, "a" : 2 } { "_id" : 3, "a" : 3 } { "_id" : 4, "a" : 4 }
If we set batch size and request an unindexed sort, then the server will return just a single batch. This is done so that the server can perform a top k sort.
> db.t.find().sort({a: 1}).batchSize(2) { "_id" : 1, "a" : 1 } { "_id" : 2, "a" : 2 }
If we set batch size, and there is index that will provide the sort, then as many batches are returned as required to fully answer the query:
> db.t.ensureIndex({a: 1}) > db.t.find().sort({a: 1}).batchSize(2) { "_id" : 1, "a" : 1 } { "_id" : 2, "a" : 2 } { "_id" : 3, "a" : 3 } { "_id" : 4, "a" : 4 }
If an index is available which can provide the sort, then the server will always select the plan with the indexed sort. This is key to the old behavior: even if there is a plan with a blocking sort stage that is more efficient, the plan with the indexed sort is preferred.
New behavior (e.g. 2.5.4)
In the case that
- the batch size is set,
- a sort is requested, and
- there is an index that provides the sort,
the server may or may not select a query plan with a blocking sort. As a consequence, there are some cases in which we expect to get all results from a query back, but instead a plan with a blocking sort is selected and we end up with only one batch.
- is duplicated by
-
SERVER-14228 Setting batchSize and sort on a cursor in sharded collection causes fewer than all documents to be returned
- Closed
- is related to
-
SERVER-17011 Cursor can return objects out of order if updated during query ("legacy" readMode only)
- Closed
- related to
-
SERVER-13316 sorts with multiple batches with small collections can be slower in rc2
- Closed