ISSUE SUMMARY
For historical reasons, the wire protocol does not pass separate values for limit() and batchSize(), but rather passes the smaller of the two in a single ntoreturn field. The consequence is that the server does not know whether the ntoreturn value is a batchSize or a limit.
Usually this doesn't matter much, but it matters a lot if there is a sort. Suppose that the server receives an ntoreturn of 10 with a sort():
- If this is limit(10), then we can find the top 10 using a top-K sort. This requires buffering only 10 results in memory.
- If this is a batchSize(10), then we have to do a full sort. This requires buffering all matching documents in memory.
In order to cope with this, the server creates query plans which behave as follows:
- First they do a top-K sort and return the first 10 results. If the client means limit(10), it will be happy and close the cursor.
- If the client means batchSize(10), then it will ask for new batches with a getmore. At this point, the query plan falls back to doing a full sort.
The problem with this strategy is that plans are worked until they finish or produce about 100 results by the ranker. The plans will thus move on to the "do a full sort" phase. This results in buffering more data than necessary if the client means limit() and not batchSize().
USER IMPACT
Queries with an unindexed sort() and a limit() may buffer more documents than required, causing the query to hit the blocking sort memory limit and fail.
WORKAROUNDS
Hinting or adding an index that provides the sort() may be a suitable workaround depending on the data and the query, but the indexed sort plan may not be efficient for everyone.
AFFECTED VERSIONS
MongoDB production releases 2.6.0 and 2.6.1 are affected by this issue.
FIX VERSION
The fix is included in the 2.6.2 production release.
RESOLUTION DETAILS
Treat ntoreturn as though it is a limit during plan ranking to avoid buffering too much data during a limit() with a sort(). If either a sort() or limit() is specified, plan ranking now ceases as soon as any query plan returns min(ntoreturn, 101) results. This prevents the query from switching from the top-K phase to the in-memory sort phase.
Original description
The query plan selection seems to be ignoring the limit specified while evaluating the multiple plans that may be usable for a specific query. In this instance the query has sort() and limit(), however the issue may be much wider than that. The diagnosis is based on the logLevel:2 logs. Plan selection is not invoked in following cases and the query works fine:
- When hint is specified explicitly - everyone is happy
- When there is only one matching index - leading to it being used
- When there are no matching indices leading to full collection scan
- When there is a perfect match index - leading to it being used
Steps to reproduce the issue:
- Generate big sample records (or even small would work if the query result exceeds the 32MB size)
bigStr = "ABCDEFGHIJKLMNBOPQRSTUVWXYZ012345687890"; // Generate big string to use in the object - 1MB+ String while (bigStr.length < 1000000) { bigStr = bigStr + "::" + bigStr; } for (i = 0; i < 400; i++) { doc = {x: i % 2, y: i % 3, other: i % 3, xval: 1, yval: i %10, name: bigStr} ; if (i % 3) { doc.tags = ["test" + i % 10]; } db.sortf.insert(doc); }
- Create two or more indices that can match the query / sorting criteria
db.sortf.ensureIndex({ "x" : 1, "y" : 1, "tags" : 1, "other" : -1 }) db.sortf.ensureIndex({ "x" : 1, "xval" : 1 })
- Perform any of the following queries without hint and with hint (any index or $natural). The queries without limit should fail if the sort buffer exceeds, but that should never be the case for limit queries which are within the bounds (i.e. here limit of 10 would mean only about 13-14 MB of data).
db.sortf.find({x: 1, y: 1, other: {$gte: 0}}, {x: 1, y: 1, other: 1}).limit(1).sort({other: -1}) db.sortf.find({x: 1, y: 1, other: {$gte: 0}}, {x: 1, y: 1, other: 1}).limit(10).sort({other: -1}) db.sortf.find({x: 1, y: 1, other: {$gte: 0}}, {x: 1, y: 1, other: 1}).sort({other: -1}) db.sortf.find({x: 1, y: 1, other: {$gte: 0}}, {x: 1, y: 1, other: 1}).sort({y: 1}) db.sortf.find({x: 1, y: 1, other: {$gte: 0}}, {x: 1, y: 1, other: 1}).sort({y: -1}) db.sortf.find({x: 1, y: 1, other: {$gte: 0}}, {x: 1, y: 1, other: 1}).sort({y: -1})
Failure log snippet for a query plan selection:
2014-06-04T15:07:16.200-0400 [conn5] Running query: query: { x: 1.0, y: 1.0, other: { $gte: 0.0 } } sort: { other: -1 } projection: {} 2014-06-04T15:07:16.201-0400 [conn5] Relevant index 0 is kp: { y: 1.0, ProductCode: 1.0 } io: { v: 1, key: { y: 1.0, ProductCode: 1.0 }, name: "y_1_ProductCode_1", ns: "test.sortf" } 2014-06-04T15:07:16.201-0400 [conn5] Relevant index 1 is kp: { y: 1.0, x: 1.0, tags: 1.0, other: -1.0 } io: { v: 1, key: { y: 1.0, x: 1.0, tags: 1.0, other: -1.0 }, name: "y_1_x_1_tags_1_other_-1", ns: "test.sortf" } 2014-06-04T15:07:16.201-0400 [conn5] Relevant index 0 is kp: { other: -1 } multikey 2014-06-04T15:07:16.202-0400 [conn5] Relevant index 0 is kp: { other: -1 } multikey 2014-06-04T15:07:16.212-0400 [conn5] Relevant index 0 is kp: { other: -1 } multikey 2014-06-04T15:07:16.213-0400 [conn5] Relevant index 0 is kp: { other: -1 } multikey 2014-06-04T15:07:16.215-0400 [conn5] User Assertion: 17144:Runner error: Overflow sort stage buffered data usage of 33607400 bytes exceeds internal limit of 33554432 bytes 2014-06-04T15:07:16.216-0400 [conn5] assertion 17144 Runner error: Overflow sort stage buffered data usage of 33607400 bytes exceeds internal limit of 33554432 bytes ns:test.sortf query:{ query: { x: 1.0, y: 1.0, other: { $gte: 0.0 } }, orderby: { other: -1.0 } } 2014-06-04T15:07:16.216-0400 [conn5] ntoskip:0 ntoreturn:12014-06-04T15:07:16.216-0400 [conn5] query test.sortf query: { query: { x: 1.0, y: 1.0, other: { $gte: 0.0 } }, orderby: { other: -1.0 } } ntoreturn:1 keyUpdates:0 exception: Runner error: Overflow sort stage buffered data usage of 33607400 bytes exceeds internal limit of 33554432 bytes code:17144 numYields:0 locks(micros) r:15266 reslen:158 15ms 2014-06-04T15:07:16.217-0400 [conn5] run command admin.$cmd { replSetGetStatus: 1.0, forShell: 1.0 } 2014-06-04T15:07:16.218-0400 [conn5] command: { replSetGetStatus: 1.0, forShell: 1.0 }
- related to
-
SERVER-14228 Setting batchSize and sort on a cursor in sharded collection causes fewer than all documents to be returned
- Closed