If a user issues a limit=N query that includes a sort against a sharded collection and all of the first N results come back from one shard, mongos will request an extra batch of results from that shard.
This bug has a particularly harmful interaction with the "top-K sort" logic introduced in version 2.6.0 of the server. If mongos requests >N results from a shard for a query that has a limit of N and uses an unindexed sort, then the shard will perform a full "unlimited" sort (and return "getMore runner error: Overflow sort stage buffered data usage..." to the user if the size of the data to be sorted exceeds the 32MB sort stage limit; see script attached to this ticket for an example that exhibits this failure mode).
Reproduce with the following:
var st = new ShardingTest({shards: 2, other: {shardOptions: {verbose: 1}}}); st.stopBalancer(); var coll = st.s0.getDB("test").foo; assert.commandWorked(coll.getDB().adminCommand({enableSharding: coll.getDB().getName()})); assert.commandWorked(coll.getDB().adminCommand({movePrimary: coll.getDB().getName(), to: "shard0000"})); assert.commandWorked(coll.getDB().adminCommand({shardCollection: coll.getFullName(), key: {_id: 1}})); assert.commandWorked(coll.getDB().adminCommand({split: coll.getFullName(), middle: {_id: 0}})); assert.commandWorked(coll.getDB().adminCommand({moveChunk: coll.getFullName(), find: {_id: 0}, to: "shard0001"})); // Write 10 documents to shard 0, and 10 documents to shard 1. for (var i=1; i<=10; ++i) { assert.writeOK(coll.insert({_id: i, x: i, y: i})); assert.writeOK(coll.insert({_id: -i, x: -i, y: i})); } coll.find().sort({x:1}).limit(9).itcount(); // Shard 0 contributes all 9 results. coll.find().sort({y:1}).limit(9).itcount(); // Both shards contribute to the 9 results.
The query on line 21 requests 9 documents from both shards, and then an unnecessary additional batch from shard 0:
m30000| 2014-06-18T17:36:14.380-0400 [conn6] query test.foo query: { query: {}, orderby: { x: 1.0 } } planSummary: COLLSCAN, COLLSCAN cursorid:70249920950 ntoreturn:9 ntoskip:0 keyUpdates:0 numYields:0 locks(micros) r:201 nreturned:9 reslen:380 0ms m30001| 2014-06-18T17:36:14.380-0400 [conn4] query test.foo query: { query: {}, orderby: { x: 1.0 } } planSummary: COLLSCAN, COLLSCAN cursorid:23467375332 ntoreturn:9 ntoskip:0 keyUpdates:0 numYields:0 locks(micros) r:214 nreturned:9 reslen:380 0ms m30000| 2014-06-18T17:36:14.380-0400 [conn4] getmore test.foo cursorid:70249920950 ntoreturn:9 keyUpdates:0 numYields:0 locks(micros) r:133 nreturned:1 reslen:60 0ms
Whereas the query on line 22 requests 9 documents from both shards and does not follow up with any getmore requests:
m30000| 2014-06-18T17:36:14.381-0400 [conn6] query test.foo query: { query: {}, orderby: { y: 1.0 } } planSummary: COLLSCAN, COLLSCAN cursorid:70393723465 ntoreturn:9 ntoskip:0 keyUpdates:0 numYields:0 locks(micros) r:162 nreturned:9 reslen:380 0ms m30001| 2014-06-18T17:36:14.382-0400 [conn4] query test.foo query: { query: {}, orderby: { y: 1.0 } } planSummary: COLLSCAN, COLLSCAN cursorid:22786735268 ntoreturn:9 ntoskip:0 keyUpdates:0 numYields:0 locks(micros) r:149 nreturned:9 reslen:380 0ms
- related to
-
SERVER-7676 find usings sort,skip,limit where sort is on an unindexed attribute produces an error when executed from mongos command prompt but works correctly from mongod command prompt.
- Closed
-
SERVER-14306 mongos can cause the in-memory sort limit to be hit on shards by requesting more results than needed
- Closed