-
Type: Bug
-
Resolution: Done
-
Priority: Major - P3
-
Affects Version/s: None
-
Component/s: Testing Infrastructure
-
Fully Compatible
-
Windows
This affects the following test scripts:
- jstests/sharding/cleanup_orphaned_cmd_prereload.js
- jstests/sharding/features1.js
- jstests/sharding/key_many.js
When a moveChunk is issued with a destination mongod process that is still finishing a previous moveChunk request, the new moveChunk command will fail with a "migrate already in progress" error.
Resolution:
Especially when the moveChunk command is not the primary goal of the test, it is fine to wrap the moveChunk command in a limited retry logic to accommodate slow hosts.
3b87e21f1151 Windows 32-bit sharding : features1.js (logfile)
m30999| 2013-10-22T04:17:42.826+0000 [conn1] CMD: shardcollection: { shardcollection: "test.foo6", key: { a: 1.0 } } m30999| 2013-10-22T04:17:42.827+0000 [conn1] enable sharding on: test.foo6 with shard key: { a: 1.0 } m30999| 2013-10-22T04:17:42.827+0000 [conn1] about to log metadata event: { _id: "EC2AMAZ-ZVRDQB5-2013-10-22T04:17:42-5265fc6639672851e14c5983", server: "EC2AMAZ-ZVRDQB5", clientAddr: "N/A", time: new Date(1382415462827), what: "shardCollection.start", ns: "test.foo6", details: { shardKey: { a: 1.0 }, collection: "test.foo6", primary: "shard0001:localhost:30001", initShards: [], numChunks: 1 } } m30999| 2013-10-22T04:17:42.828+0000 [conn1] going to create 1 chunk(s) for: test.foo6 using new epoch 5265fc6639672851e14c5984 m30999| 2013-10-22T04:17:42.829+0000 [conn1] ChunkManager: time to load chunks for test.foo6: 0ms sequenceNumber: 10 version: 1|0||5265fc6639672851e14c5984 based on: (empty) m30999| 2013-10-22T04:17:42.830+0000 [conn1] setShardVersion shard0001 localhost:30001 test.foo6 { setShardVersion: "test.foo6", configdb: "localhost:30000", version: Timestamp 1000|0, versionEpoch: ObjectId('5265fc6639672851e14c5984'), serverID: ObjectId('5265fc6039672851e14c5973'), shard: "shard0001", shardHost: "localhost:30001" } 00B2D5B0 10 m30999| 2013-10-22T04:17:42.830+0000 [conn1] setShardVersion failed! m30999| { oldVersion: Timestamp 0|0, oldVersionEpoch: ObjectId('000000000000000000000000'), ns: "test.foo6", need_authoritative: true, ok: 0.0, errmsg: "first time for collection 'test.foo6'" } m30999| 2013-10-22T04:17:42.831+0000 [conn1] setShardVersion shard0001 localhost:30001 test.foo6 { setShardVersion: "test.foo6", configdb: "localhost:30000", version: Timestamp 1000|0, versionEpoch: ObjectId('5265fc6639672851e14c5984'), serverID: ObjectId('5265fc6039672851e14c5973'), authoritative: true, shard: "shard0001", shardHost: "localhost:30001" } 00B2D5B0 10 m30001| 2013-10-22T04:17:42.831+0000 [conn3] remotely refreshing metadata for test.foo6 with requested shard version 1|0||5265fc6639672851e14c5984, current shard version is 0|0||000000000000000000000000 m30001| 2013-10-22T04:17:42.833+0000 [conn3] loaded new metadata for test.foo6 (loaded version : 1|0||5265fc6639672851e14c5984, stored version : 0|0||000000000000000000000000, took 1ms) m30999| 2013-10-22T04:17:42.833+0000 [conn1] setShardVersion success: { oldVersion: Timestamp 0|0, oldVersionEpoch: ObjectId('000000000000000000000000'), ok: 1.0 } m30999| 2013-10-22T04:17:42.833+0000 [conn1] about to log metadata event: { _id: "EC2AMAZ-ZVRDQB5-2013-10-22T04:17:42-5265fc6639672851e14c5985", server: "EC2AMAZ-ZVRDQB5", clientAddr: "N/A", time: new Date(1382415462833), what: "shardCollection", ns: "test.foo6", details: { version: "1|0||5265fc6639672851e14c5984" } } m30999| 2013-10-22T04:17:42.835+0000 [conn1] splitting chunk [{ a: MinKey },{ a: MaxKey }) in collection test.foo6 on shard shard0001 m30001| 2013-10-22T04:17:42.836+0000 [conn4] received splitChunk request: { splitChunk: "test.foo6", keyPattern: { a: 1.0 }, min: { a: MinKey }, max: { a: MaxKey }, from: "shard0001", splitKeys: [ { a: 2.0 } ], shardId: "test.foo6-a_MinKey", configdb: "localhost:30000" } m30001| 2013-10-22T04:17:42.838+0000 [conn4] distributed lock 'test.foo6/EC2AMAZ-ZVRDQB5:30001:1382415456:41' acquired, ts : 5265fc66ee4bb720b15bc871 m30001| 2013-10-22T04:17:42.839+0000 [conn4] remotely refreshing metadata for test.foo6 based on current shard version 1|0||5265fc6639672851e14c5984 m30001| 2013-10-22T04:17:42.841+0000 [conn4] newer metadata not found for test.foo6 (loaded version : 1|0||5265fc6639672851e14c5984, stored version : 1|0||5265fc6639672851e14c5984, took 1ms) m30001| 2013-10-22T04:17:42.841+0000 [conn4] splitChunk accepted at version 1|0||5265fc6639672851e14c5984 m30001| 2013-10-22T04:17:42.842+0000 [conn4] about to log metadata event: { _id: "EC2AMAZ-ZVRDQB5-2013-10-22T04:17:42-5265fc66ee4bb720b15bc872", server: "EC2AMAZ-ZVRDQB5", clientAddr: "127.0.0.1:55273", time: new Date(1382415462842), what: "split", ns: "test.foo6", details: { before: { min: { a: MinKey }, max: { a: MaxKey }, lastmod: Timestamp 1000|0, lastmodEpoch: ObjectId('000000000000000000000000') }, left: { min: { a: MinKey }, max: { a: 2.0 }, lastmod: Timestamp 1000|1, lastmodEpoch: ObjectId('5265fc6639672851e14c5984') }, right: { min: { a: 2.0 }, max: { a: MaxKey }, lastmod: Timestamp 1000|2, lastmodEpoch: ObjectId('5265fc6639672851e14c5984') } } } m30001| 2013-10-22T04:17:42.845+0000 [conn4] distributed lock 'test.foo6/EC2AMAZ-ZVRDQB5:30001:1382415456:41' unlocked. m30999| 2013-10-22T04:17:42.846+0000 [conn1] ChunkManager: time to load chunks for test.foo6: 0ms sequenceNumber: 11 version: 1|2||5265fc6639672851e14c5984 based on: 1|0||5265fc6639672851e14c5984 m30999| 2013-10-22T04:17:42.849+0000 [conn1] CMD: movechunk: { movechunk: "test.foo6", find: { a: 3.0 }, to: "localhost:30000" } m30999| 2013-10-22T04:17:42.849+0000 [conn1] moving chunk ns: test.foo6 moving ( ns: test.foo6, shard: shard0001:localhost:30001, lastmod: 1|2||000000000000000000000000, min: { a: 2.0 }, max: { a: MaxKey }) shard0001:localhost:30001 -> shard0000:localhost:30000 m30001| 2013-10-22T04:17:42.850+0000 [conn4] received moveChunk request: { moveChunk: "test.foo6", from: "localhost:30001", to: "localhost:30000", fromShard: "shard0001", toShard: "shard0000", min: { a: 2.0 }, max: { a: MaxKey }, maxChunkSizeBytes: 52428800, shardId: "test.foo6-a_2.0", configdb: "localhost:30000", secondaryThrottle: false, waitForDelete: false } m30001| 2013-10-22T04:17:42.853+0000 [conn4] distributed lock 'test.foo6/EC2AMAZ-ZVRDQB5:30001:1382415456:41' acquired, ts : 5265fc66ee4bb720b15bc873 m30001| 2013-10-22T04:17:42.853+0000 [conn4] about to log metadata event: { _id: "EC2AMAZ-ZVRDQB5-2013-10-22T04:17:42-5265fc66ee4bb720b15bc874", server: "EC2AMAZ-ZVRDQB5", clientAddr: "127.0.0.1:55273", time: new Date(1382415462853), what: "moveChunk.start", ns: "test.foo6", details: { min: { a: 2.0 }, max: { a: MaxKey }, from: "shard0001", to: "shard0000" } } m30001| 2013-10-22T04:17:42.854+0000 [conn4] remotely refreshing metadata for test.foo6 based on current shard version 1|2||5265fc6639672851e14c5984 m30001| 2013-10-22T04:17:42.855+0000 [conn4] newer metadata not found for test.foo6 (loaded version : 1|2||5265fc6639672851e14c5984, stored version : 1|2||5265fc6639672851e14c5984, took 1ms) m30001| 2013-10-22T04:17:42.856+0000 [conn4] moveChunk request accepted at version 1|2||5265fc6639672851e14c5984 m30001| 2013-10-22T04:17:42.856+0000 [conn4] moveChunk number of documents: 2 m30001| 2013-10-22T04:17:42.856+0000 [conn4] warning: moveChunk failed to engage TO-shard in the data transfer: migrate already in progress m30001| 2013-10-22T04:17:42.856+0000 [conn4] MigrateFromStatus::done About to acquire global write lock to exit critical section m30001| 2013-10-22T04:17:42.856+0000 [conn4] MigrateFromStatus::done Global lock acquired m30001| 2013-10-22T04:17:42.857+0000 [conn4] distributed lock 'test.foo6/EC2AMAZ-ZVRDQB5:30001:1382415456:41' unlocked. m30001| 2013-10-22T04:17:42.857+0000 [conn4] about to log metadata event: { _id: "EC2AMAZ-ZVRDQB5-2013-10-22T04:17:42-5265fc66ee4bb720b15bc875", server: "EC2AMAZ-ZVRDQB5", clientAddr: "127.0.0.1:55273", time: new Date(1382415462857), what: "moveChunk.from", ns: "test.foo6", details: { min: { a: 2.0 }, max: { a: MaxKey }, step 1 of 6: 0, step 2 of 6: 5, note: "aborted" } } m30999| 2013-10-22T04:17:42.857+0000 [conn1] moveChunk result: { cause: { ok: 0.0, errmsg: "migrate already in progress" }, ok: 0.0, errmsg: "moveChunk failed to engage TO-shard in the data transfer: migrate already in progress" } m30001| 2013-10-22T04:17:42.862+0000 [RangeDeleter] Deleter starting delete for: test.foo4 from { num: 10.0 } -> { num: MaxKey }, with opId: 126 m30001| 2013-10-22T04:17:42.862+0000 [RangeDeleter] rangeDeleter deleted 0 documents for test.foo4 from { num: 10.0 } -> { num: MaxKey } 2013-10-22T04:17:42.872+0000 command { "movechunk" : "test.foo6", "find" : { "a" : 3 }, "to" : "localhost:30000" } failed: { "cause" : { "cause" : { "ok" : 0, "errmsg" : "migrate already in progress" }, "ok" : 0, "errmsg" : "moveChunk failed to engage TO-shard in the data transfer: migrate already in progress" }, "ok" : 0, "errmsg" : "move failed" } at src/mongo/shell/shardingtest.js:558 failed to load: D:\data\mci\git@github.commongodb\mongo.git\master\jstests\sharding\features1.js
- is related to
-
SERVER-10232 better migration stats to detect TO-side readiness - shard3.js
- Closed
-
SERVER-11122 moveChunk fails in sharding/hash_basic.js on slow hosts
- Closed