When performing a write on a mongod with sharding enabled, we check the shard version of the message (handlePossibleShardedMessage()) before getting a write context (Client::Context) (see instance.cpp). Getting a write context also performs a version check. If the version changes between these checks, an error is thrown instead of the write being queued for writeback, and so the write fails.
m30001| Wed Mar 6 00:14:13.584 [conn4] moveChunk migrate commit accepted by TO-shard: { active: true, ns: "test.foo", from: "localhost:30001", min: { a: 303.7011115811765 }, max: { a: 597.0397985074669 }, shardKeyPattern: { a: 1.0 }, state: "done", counts: { cloned: 759, clonedBytes: 816684, catchup: 0, steady: 0 }, ok: 1.0 } m30001| Wed Mar 6 00:14:13.585 [conn4] moveChunk updating self version to: 3|1||5136d0911027824dad63777b through { a: 597.0397985074669 } -> { a: 998.8787742331624 } for collection 'test.foo' m30000| Wed Mar 6 00:14:13.586 [conn13] info PageFaultRetryableSection will not yield, already locked upon reaching m30999| Wed Mar 6 00:14:13.587 [WriteBackListener-localhost:30001] new version change detected, 1 writebacks processed previously m30999| Wed Mar 6 00:14:13.587 [WriteBackListener-localhost:30001] writeback failed because of stale config, retrying attempts: 1 m30999| Wed Mar 6 00:14:13.587 [WriteBackListener-localhost:30001] writeback error : { singleShard: "localhost:30001", err: "cannot queue a writeback operation to the writeback queue", code: 9517, n: 0, connectionId: 7, ok: 1.0 } m30001| Wed Mar 6 00:14:13.589 [conn4] about to log metadata event: { _id: "bs-mm2.local-2013-03-06T05:14:13-5136d0a5549d76650d9e1858", server: "bs-mm2.local", clientAddr: "127.0.0.1:64877", time: new Date(1362546853589), what: "moveChunk.commit", ns: "test.foo", details: { min: { a: 303.7011115811765 }, max: { a: 597.0397985074669 }, from: "shard0001", to: "shard0000" } } m30001| Wed Mar 6 00:14:13.589 [conn4] MigrateFromStatus::done About to acquire global write lock to exit critical section m30001| Wed Mar 6 00:14:13.589 [conn4] MigrateFromStatus::done Global lock acquired m30001| Wed Mar 6 00:14:13.589 [conn4] forking for cleanup of chunk data m30001| Wed Mar 6 00:14:13.589 [conn4] MigrateFromStatus::done About to acquire global write lock to exit critical section m30001| Wed Mar 6 00:14:13.590 [conn4] MigrateFromStatus::done Global lock acquired m30001| Wed Mar 6 00:14:13.590 [cleanupOldData-5136d0a5549d76650d9e1859] (start) waiting to cleanup test.foo from { a: 303.7011115811765 } -> { a: 597.0397985074669 }, # cursors remaining: 0 m30001| Wed Mar 6 00:14:13.591 [conn4] distributed lock 'test.foo/bs-mm2.local:30001:1362546833:2078045840' unlocked. m30001| Wed Mar 6 00:14:13.591 [conn4] about to log metadata event: { _id: "bs-mm2.local-2013-03-06T05:14:13-5136d0a5549d76650d9e185a", server: "bs-mm2.local", clientAddr: "127.0.0.1:64877", time: new Date(1362546853591), what: "moveChunk.from", ns: "test.foo", details: { min: { a: 303.7011115811765 }, max: { a: 597.0397985074669 }, step1 of 6: 0, step2 of 6: 5, step3 of 6: 113, step4 of 6: 1044, step5 of 6: 64, step6 of 6: 0 } } m30001| Wed Mar 6 00:14:13.592 [conn4] command admin.$cmd command: { moveChunk: "test.foo", from: "localhost:30001", to: "localhost:30000", fromShard: "shard0001", toShard: "shard0000", min: { a: 303.7011115811765 }, max: { a: 597.0397985074669 }, maxChunkSizeBytes: 1048576, shardId: "test.foo-a_303.7011115811765", configdb: "localhost:30000", secondaryThrottle: true, waitForDelete: false } ntoreturn:1 keyUpdates:0 numYields: 5 locks(micros) W:85 r:21604 w:87 reslen:37 1230ms m30999| Wed Mar 6 00:14:13.592 [Balancer] moveChunk result: { ok: 1.0 } m30001| Wed Mar 6 00:14:13.611 [cleanupOldData-5136d0a5549d76650d9e1859] waiting to remove documents for test.foo from { a: 303.7011115811765 } -> { a: 597.0397985074669 } m30001| Wed Mar 6 00:14:13.611 [cleanupOldData-5136d0a5549d76650d9e1859] moveChunk starting delete for: test.foo from { a: 303.7011115811765 } -> { a: 597.0397985074669 } m30999| Wed Mar 6 00:14:13.624 [WriteBackListener-localhost:30001] ChunkManager: time to load chunks for test.foo: 34ms sequenceNumber: 8 version: 3|1||5136d0911027824dad63777b based on: 2|5||5136d0911027824dad63777b m30999| Wed Mar 6 00:14:13.625 [WriteBackListener-localhost:30001] setShardVersion shard0000 localhost:30000 test.foo { setShardVersion: "test.foo", configdb: "localhost:30000", version: Timestamp 3000|0, versionEpoch: ObjectId('5136d0911027824dad63777b'), serverID: ObjectId('5136d08f1027824dad637779'), shard: "shard0000", shardHost: "localhost:30000" } 0x106072fc0 8 m30999| Wed Mar 6 00:14:13.628 [conn1] setShardVersion shard0000 localhost:30000 test.foo { setShardVersion: "test.foo", configdb: "localhost:30000", version: Timestamp 3000|0, versionEpoch: ObjectId('5136d0911027824dad63777b'), serverID: ObjectId('5136d08f1027824dad637779'), shard: "shard0000", shardHost: "localhost:30000" } 0x106072620 8 ========> Saved total of 3000 documents m30000| Wed Mar 6 00:14:13.691 [conn6] insert test.foo keyUpdates:0 exception: [test.foo] shard version not ok in Client::Context: version mismatch detected for test.foo, stored major version 3 does not match received 2 ( ns : test.foo, received : 2|0||5136d0911027824dad63777b, wanted : 3|0||5136d0911027824dad63777b, send ) code:13388 locks(micros) w:64256 67ms m30999| Wed Mar 6 00:14:13.693 [WriteBackListener-localhost:30001] setShardVersion success: { oldVersion: Timestamp 2000|0, oldVersionEpoch: ObjectId('5136d0911027824dad63777b'), ok: 1.0 }
The last two lines show the result of a setShardVersion and insert hitting m30000 at the same time (shard0000).
Possible solution - read lock during ShardingState::trySetVersion()?
- is depended on by
-
SERVER-8741 mrShardedOutput.js failing on multiple platforms - ensure transferMods after commit
- Closed
- related to
-
SERVER-8741 mrShardedOutput.js failing on multiple platforms - ensure transferMods after commit
- Closed