If a move-update results in an error and is rolled back (for example, if a duplicate key exception is encountered), any open cursors that were pointing to the affected document could be advanced during invalidation to a record that is no longer valid. The query operations that own these cursors can subsequently return invalid results, or trip a fatal assertion in the storage layer.
This is a regression introduced in the 3.0.x series (reproduced with 3.0.7 and master) that affects mmapv1 deployments only. See discussion in related ticket SERVER-21037 for an explanation of why 2.6.x and earlier is unaffected.
Reproduce with the following script:
assert.commandWorked(db.adminCommand({setParameter: 1, internalQueryExecYieldIterations: 3})); assert.commandWorked(db.dropDatabase()); assert.commandWorked(db.foo.ensureIndex({a: 1}, {unique: true})); assert.writeOK(db.foo.insert({_id: 0, a: 0})); assert.writeOK(db.foo.insert({_id: 1, a: 1})); assert.writeOK(db.foo.insert({_id: 2, a: 2})); assert.writeOK(db.foo.insert({_id: 3, a: 3, x: new Array(1024).join("x")})); assert.writeOK(db.foo.insert({_id: 4, a: 4, x: new Array(1024).join("x")})); assert.writeOK(db.foo.remove({_id: 3})); assert.writeOK(db.foo.remove({_id: 4})); startParallelShell( 'while (true) { \ for (var i=0; i<3; i++) { \ db.foo.update({_id: i}, {$set: {x: new Array(1024).join("x"), a: (i + 1) % 3}}); \ } \ sleep(1000); \ }'); db.foo.find().itcount();
When run against master (07168e08) with the below patch applied, the server trips fatal assertion 17441 on the last line of the script. When run against v3.0 with the below patch applied, the server returns the error "BSONObj size: -286331154 (0xEEEEEEEE) is invalid. Size must be between 0 and 16793600(16MB) First element: _id: 4.0" to the user in the last line of the script.
Patch to greatly increase reproducibility:
diff --git a/src/mongo/db/query/query_yield.cpp b/src/mongo/db/query/query_yield.cpp index 4e0d463..7edde6e 100644 --- a/src/mongo/db/query/query_yield.cpp +++ b/src/mongo/db/query/query_yield.cpp @@ -62,6 +62,10 @@ void QueryYield::yieldAllLocks(OperationContext* txn, RecordFetcher* fetcher) { // locks). If we are yielding, we are at a safe place to do so. txn->recoveryUnit()->abandonSnapshot(); + if (txn->getNS() == "test.foo") { + sleepmillis(2000); + } + // Track the number of yields in CurOp. CurOp::get(txn)->yielded();
- related to
-
SERVER-21037 Initial sync can miss documents if concurrent update results in error (mmapv1 only)
- Closed
-
SERVER-42022 Attempt to remove initial sync missing document fetching
- Closed
-
SERVER-21058 need fail point to stress yielding behavior
- Closed