-
Type: Bug
-
Resolution: Done
-
Priority: Major - P3
-
Affects Version/s: 2.8.0-rc4
-
Component/s: Internal Code, Sharding
-
None
-
Fully Compatible
-
ALL
-
2.8.0-rc4 does not time out migrations after they have been waiting for the pre-commit w:majority for 10 hours. This is a regression from 2.6, where such migrations are correctly aborted after 10 hours.
Full logs and config dumps are attached. Relevant bits + output are below:
2.8.0-rc4
2015-01-13T02:11:12.428+0000 I SHARDING [migrateThread] starting receiving-end of migration of chunk { _id: MinKey } -> { _id: 0.0 } for collection test.test from shard_0/kev1-0.kevincm1.8331.mongodbdns.com:28000,kev1-0.kevincm1.8331.mongodbdns.com:28001 at epoch 54b4704956127074c739b3f6 2015-01-13T02:11:12.429+0000 I SHARDING [migrateThread] Deleter starting delete for: test.test from { _id: MinKey } -> { _id: 0.0 }, with opId: 31603 2015-01-13T02:11:12.429+0000 I SHARDING [migrateThread] rangeDeleter deleted 0 documents for test.test from { _id: MinKey } -> { _id: 0.0 } 2015-01-13T02:11:12.429+0000 I SHARDING [migrateThread] Waiting for replication to catch up before entering critical section 2015-01-13T02:11:13.430+0000 I SHARDING [migrateThread] Waiting for replication to catch up before entering critical section 2015-01-13T02:11:14.430+0000 I SHARDING [migrateThread] Waiting for replication to catch up before entering critical section ... 2015-01-13T12:11:04.816+0000 I SHARDING [migrateThread] Waiting for replication to catch up before entering critical section 2015-01-13T12:11:05.816+0000 I SHARDING [migrateThread] Waiting for replication to catch up before entering critical section 2015-01-13T12:11:05.816+0000 W SHARDING [migrateThread] migrate commit waiting for a majority of slaves for 'test.test' { _id: MinKey } -> { _id: 0.0 } waiting for: 54b47ec0:1 2015-01-13T12:11:06.816+0000 I SHARDING [migrateThread] Waiting for replication to catch up before entering critical section 2015-01-13T12:11:07.817+0000 I SHARDING [migrateThread] Waiting for replication to catch up before entering critical section 2015-01-13T12:11:08.817+0000 I SHARDING [migrateThread] Waiting for replication to catch up before entering critical section 2015-01-13T12:11:09.817+0000 I SHARDING [migrateThread] Waiting for replication to catch up before entering critical section 2015-01-13T12:11:10.817+0000 I SHARDING [migrateThread] Waiting for replication to catch up before entering critical section 2015-01-13T12:11:11.817+0000 I SHARDING [migrateThread] Waiting for replication to catch up before entering critical section 2015-01-13T12:11:12.818+0000 I SHARDING [migrateThread] Waiting for replication to catch up before entering critical section 2015-01-13T12:11:13.818+0000 I SHARDING [migrateThread] Waiting for replication to catch up before entering critical section 2015-01-13T12:11:14.818+0000 I SHARDING [migrateThread] Waiting for replication to catch up before entering critical section 2015-01-13T12:11:15.818+0000 I SHARDING [migrateThread] Waiting for replication to catch up before entering critical section 2015-01-13T12:11:16.818+0000 I SHARDING [migrateThread] Waiting for replication to catch up before entering critical section 2015-01-13T12:11:17.818+0000 I SHARDING [migrateThread] Waiting for replication to catch up before entering critical section 2015-01-13T12:11:18.819+0000 I SHARDING [migrateThread] Waiting for replication to catch up before entering critical section 2015-01-13T12:11:19.819+0000 I SHARDING [migrateThread] Waiting for replication to catch up before entering critical section 2015-01-13T12:11:20.819+0000 I SHARDING [migrateThread] Waiting for replication to catch up before entering critical section 2015-01-13T12:11:20.819+0000 W SHARDING [migrateThread] migrate commit waiting for a majority of slaves for 'test.test' { _id: MinKey } -> { _id: 0.0 } waiting for: 54b47ec0:1 2015-01-13T12:11:21.819+0000 I SHARDING [migrateThread] Waiting for replication to catch up before entering critical section 2015-01-13T12:11:22.819+0000 I SHARDING [migrateThread] Waiting for replication to catch up before entering critical section ... 2015-01-14T02:47:43.070+0000 I SHARDING [migrateThread] Waiting for replication to catch up before entering critical section 2015-01-14T02:47:44.071+0000 I SHARDING [migrateThread] Waiting for replication to catch up before entering critical section 2015-01-14T02:47:45.071+0000 I SHARDING [migrateThread] Waiting for replication to catch up before entering critical section 2015-01-14T02:47:46.071+0000 I SHARDING [migrateThread] Waiting for replication to catch up before entering critical section 2015-01-14T02:47:46.071+0000 W SHARDING [migrateThread] migrate commit waiting for a majority of slaves for 'test.test' { _id: MinKey } -> { _id: 0.0 } waiting for: 54b47ec0:1 2015-01-14T02:47:47.071+0000 I SHARDING [migrateThread] Waiting for replication to catch up before entering critical section 2015-01-14T02:47:48.071+0000 I SHARDING [migrateThread] Waiting for replication to catch up before entering critical section 2015-01-14T02:47:49.071+0000 I SHARDING [migrateThread] Waiting for replication to catch up before entering critical section 2015-01-14T02:47:50.072+0000 I SHARDING [migrateThread] Waiting for replication to catch up before entering critical section 2015-01-14T02:47:51.072+0000 I SHARDING [migrateThread] Waiting for replication to catch up before entering critical section
mongos> printjson(ISODate()); db.getSiblingDB("config").locks.find() ISODate("2015-01-13T04:11:44.821Z") { "_id" : "configUpgrade", "state" : 0, "who" : "kev1-0:27017:1421111141:1804289383:mongosMain:846930886", "ts" : ObjectId("54b46f6556127074c739b3bf"), "process" : "kev1-0:27017:1421111141:1804289383", "when" : ISODate("2015-01-13T01:05:41.265Z"), "why" : "upgrading config database to new format v6" } { "_id" : "balancer", "state" : 2, "who" : "kev1-0:27017:1421111141:1804289383:Balancer:1681692777", "ts" : ObjectId("54b47ec056127074c739b40c"), "process" : "kev1-0:27017:1421111141:1804289383", "when" : ISODate("2015-01-13T02:11:12.413Z"), "why" : "doing balance round" } { "_id" : "test.test", "state" : 2, "who" : "kev1-0:28000:1421111452:808384670:conn14:1257153344", "ts" : ObjectId("54b47ec073e56bbf64caa124"), "process" : "kev1-0:28000:1421111452:808384670", "when" : ISODate("2015-01-13T02:11:12.422Z"), "why" : "migrating chunk [{ : MinKey }, { : MaxKey }) in test.test" } mongos> printjson(ISODate()); db.getSiblingDB("config").locks.find() ISODate("2015-01-14T02:24:24.884Z") { "_id" : "configUpgrade", "state" : 0, "who" : "kev1-0:27017:1421111141:1804289383:mongosMain:846930886", "ts" : ObjectId("54b46f6556127074c739b3bf"), "process" : "kev1-0:27017:1421111141:1804289383", "when" : ISODate("2015-01-13T01:05:41.265Z"), "why" : "upgrading config database to new format v6" } { "_id" : "balancer", "state" : 2, "who" : "kev1-0:27017:1421111141:1804289383:Balancer:1681692777", "ts" : ObjectId("54b47ec056127074c739b40c"), "process" : "kev1-0:27017:1421111141:1804289383", "when" : ISODate("2015-01-13T02:11:12.413Z"), "why" : "doing balance round" } { "_id" : "test.test", "state" : 2, "who" : "kev1-0:28000:1421111452:808384670:conn14:1257153344", "ts" : ObjectId("54b47ec073e56bbf64caa124"), "process" : "kev1-0:28000:1421111452:808384670", "when" : ISODate("2015-01-13T02:11:12.422Z"), "why" : "migrating chunk [{ : MinKey }, { : MaxKey }) in test.test" }
NOTE the when field for the balancer and collection locks hasn't changed in 24 hours.
2.6.6
2015-01-13T06:03:31.460+0000 [migrateThread] starting receiving-end of migration of chunk { _id: MinKey } -> { _id: 0.0 } for collection test.test from shard26_0/kev1-0.kevincm1.8331.mongodbdns.com:38000,kev1-0.kevincm1.8331.mongodbdns.com:38001 at epoch 54b4b3437d941a7cc774cb02 2015-01-13T06:03:31.460+0000 [migrateThread] starting new replica set monitor for replica set shard26_0 with seeds kev1-0.kevincm1.8331.mongodbdns.com:38000,kev1-0.kevincm1.8331.mongodbdns.com:38001 2015-01-13T06:03:31.462+0000 [migrateThread] allocating new ns file /data/balancer-majority-timeout-26_shard26_1_96/test.ns, filling with zeroes... 2015-01-13T06:03:31.575+0000 [migrateThread] build index on: test.test properties: { v: 1, key: { _id: 1 }, name: "_id_", ns: "test.test" } 2015-01-13T06:03:31.575+0000 [migrateThread] added index to empty collection 2015-01-13T06:03:31.576+0000 [migrateThread] Waiting for replication to catch up before entering critical section 2015-01-13T06:03:32.576+0000 [migrateThread] Waiting for replication to catch up before entering critical section 2015-01-13T06:03:33.577+0000 [migrateThread] Waiting for replication to catch up before entering critical section ... 2015-01-13T16:03:22.105+0000 [migrateThread] Waiting for replication to catch up before entering critical section 2015-01-13T16:03:23.105+0000 [migrateThread] Waiting for replication to catch up before entering critical section 2015-01-13T16:03:23.105+0000 [migrateThread] warning: migrate commit waiting for 2 slaves for 'test.test' { _id: MinKey } -> { _id: 0.0 } waiting for: 54b4b533:2 2015-01-13T16:03:24.105+0000 [migrateThread] Waiting for replication to catch up before entering critical section 2015-01-13T16:03:25.105+0000 [migrateThread] Waiting for replication to catch up before entering critical section 2015-01-13T16:03:26.105+0000 [migrateThread] Waiting for replication to catch up before entering critical section 2015-01-13T16:03:27.106+0000 [migrateThread] Waiting for replication to catch up before entering critical section 2015-01-13T16:03:28.106+0000 [migrateThread] Waiting for replication to catch up before entering critical section 2015-01-13T16:03:29.106+0000 [migrateThread] Waiting for replication to catch up before entering critical section 2015-01-13T16:03:30.106+0000 [migrateThread] Waiting for replication to catch up before entering critical section 2015-01-13T16:03:31.106+0000 [migrateThread] Waiting for replication to catch up before entering critical section 2015-01-13T16:03:32.903+0000 [migrateThread] warning: migrate commit waiting for 2 slaves for 'test.test' { _id: MinKey } -> { _id: 0.0 } waiting for: 54b4b533:2 2015-01-13T16:03:32.905+0000 [migrateThread] warning: migrate commit waiting for 2 slaves for 'test.test' { _id: MinKey } -> { _id: 0.0 } waiting for: 54b4b533:2 2015-01-13T16:03:32.906+0000 [migrateThread] warning: migrate commit waiting for 2 slaves for 'test.test' { _id: MinKey } -> { _id: 0.0 } waiting for: 54b4b533:2 ... 2015-01-13T16:04:02.894+0000 [migrateThread] warning: migrate commit waiting for 2 slaves for 'test.test' { _id: MinKey } -> { _id: 0.0 } waiting for: 54b4b533:2 2015-01-13T16:04:02.895+0000 [migrateThread] warning: migrate commit waiting for 2 slaves for 'test.test' { _id: MinKey } -> { _id: 0.0 } waiting for: 54b4b533:2 2015-01-13T16:04:02.897+0000 [migrateThread] warning: migrate commit waiting for 2 slaves for 'test.test' { _id: MinKey } -> { _id: 0.0 } waiting for: 54b4b533:2 2015-01-13T16:04:02.898+0000 [migrateThread] scoped connection to shard26_0/kev1-0.kevincm1.8331.mongodbdns.com:38000,kev1-0.kevincm1.8331.mongodbdns.com:38001 not being returned to the pool 2015-01-13T16:04:02.898+0000 [migrateThread] about to log metadata event: { _id: "kev1-0-2015-01-13T16:04:02-54b541f29fcdf3ea989d3fae", server: "kev1-0", clientAddr: ":27017", time: new Date(1421165042898), what: "moveChunk.to", ns: "test.test", details: { min: { _id: MinKey }, max: { _id: 0.0 }, step 1 of 5: 115, step 2 of 5: 0, step 3 of 5: 0, step 4 of 5: 0, note: "aborted", errmsg: "timed out waiting for commit" } } 2015-01-13T16:04:08.924+0000 [migrateThread] starting receiving-end of migration of chunk { _id: MinKey } -> { _id: 0.0 } for collection test.test from shard26_0/kev1-0.kevincm1.8331.mongodbdns.com:38000,kev1-0.kevincm1.8331.mongodbdns.com:38001 at epoch 54b4b3437d941a7cc774cb02 2015-01-13T16:04:08.926+0000 [migrateThread] warning: moveChunkCmd deleted data already in chunk # objects: 1 2015-01-13T16:04:08.926+0000 [migrateThread] Waiting for replication to catch up before entering critical section 2015-01-13T16:04:09.926+0000 [migrateThread] Waiting for replication to catch up before entering critical section 2015-01-13T16:04:09.927+0000 [migrateThread] warning: migrate commit waiting for 2 slaves for 'test.test' { _id: MinKey } -> { _id: 0.0 } waiting for: 54b541f8:3 2015-01-13T16:04:10.927+0000 [migrateThread] Waiting for replication to catch up before entering critical section 2015-01-13T16:04:11.927+0000 [migrateThread] Waiting for replication to catch up before entering critical section 2015-01-13T16:04:12.927+0000 [migrateThread] Waiting for replication to catch up before entering critical section ... 2015-01-14T02:04:00.576+0000 [migrateThread] Waiting for replication to catch up before entering critical section 2015-01-14T02:04:01.577+0000 [migrateThread] Waiting for replication to catch up before entering critical section 2015-01-14T02:04:01.577+0000 [migrateThread] warning: migrate commit waiting for 2 slaves for 'test.test' { _id: MinKey } -> { _id: 0.0 } waiting for: 54b541f8:3 2015-01-14T02:04:02.577+0000 [migrateThread] Waiting for replication to catch up before entering critical section 2015-01-14T02:04:03.577+0000 [migrateThread] Waiting for replication to catch up before entering critical section 2015-01-14T02:04:04.577+0000 [migrateThread] Waiting for replication to catch up before entering critical section 2015-01-14T02:04:05.577+0000 [migrateThread] Waiting for replication to catch up before entering critical section 2015-01-14T02:04:06.578+0000 [migrateThread] Waiting for replication to catch up before entering critical section 2015-01-14T02:04:07.578+0000 [migrateThread] Waiting for replication to catch up before entering critical section 2015-01-14T02:04:08.578+0000 [migrateThread] Waiting for replication to catch up before entering critical section 2015-01-14T02:04:09.733+0000 [migrateThread] warning: migrate commit waiting for 2 slaves for 'test.test' { _id: MinKey } -> { _id: 0.0 } waiting for: 54b541f8:3 2015-01-14T02:04:09.734+0000 [migrateThread] warning: migrate commit waiting for 2 slaves for 'test.test' { _id: MinKey } -> { _id: 0.0 } waiting for: 54b541f8:3 2015-01-14T02:04:09.735+0000 [migrateThread] warning: migrate commit waiting for 2 slaves for 'test.test' { _id: MinKey } -> { _id: 0.0 } waiting for: 54b541f8:3 ... 2015-01-14T02:04:39.723+0000 [migrateThread] warning: migrate commit waiting for 2 slaves for 'test.test' { _id: MinKey } -> { _id: 0.0 } waiting for: 54b541f8:3 2015-01-14T02:04:39.725+0000 [migrateThread] warning: migrate commit waiting for 2 slaves for 'test.test' { _id: MinKey } -> { _id: 0.0 } waiting for: 54b541f8:3 2015-01-14T02:04:39.727+0000 [migrateThread] warning: migrate commit waiting for 2 slaves for 'test.test' { _id: MinKey } -> { _id: 0.0 } waiting for: 54b541f8:3 2015-01-14T02:04:39.727+0000 [migrateThread] scoped connection to shard26_0/kev1-0.kevincm1.8331.mongodbdns.com:38000,kev1-0.kevincm1.8331.mongodbdns.com:38001 not being returned to the pool 2015-01-14T02:04:39.727+0000 [migrateThread] about to log metadata event: { _id: "kev1-0-2015-01-14T02:04:39-54b5ceb79fcdf3ea989d3faf", server: "kev1-0", clientAddr: ":27017", time: new Date(1421201079727), what: "moveChunk.to", ns: "test.test", details: { min: { _id: MinKey }, max: { _id: 0.0 }, step 1 of 5: 1, step 2 of 5: 0, step 3 of 5: 0, step 4 of 5: 0, note: "aborted", errmsg: "timed out waiting for commit" } } 2015-01-14T02:04:45.754+0000 [migrateThread] starting receiving-end of migration of chunk { _id: MinKey } -> { _id: 0.0 } for collection test.test from shard26_0/kev1-0.kevincm1.8331.mongodbdns.com:38000,kev1-0.kevincm1.8331.mongodbdns.com:38001 at epoch 54b4b3437d941a7cc774cb02 2015-01-14T02:04:45.754+0000 [migrateThread] warning: moveChunkCmd deleted data already in chunk # objects: 1 2015-01-14T02:04:45.755+0000 [migrateThread] Waiting for replication to catch up before entering critical section 2015-01-14T02:04:46.755+0000 [migrateThread] Waiting for replication to catch up before entering critical section 2015-01-14T02:04:47.755+0000 [migrateThread] Waiting for replication to catch up before entering critical section ...
mongos> printjson(ISODate()); db.getSiblingDB("config").locks.find() ISODate("2015-01-13T08:03:21.334Z") { "_id" : "configUpgrade", "state" : 0, "who" : "kev1-0:37017:1421128191:1804289383:mongosMain:846930886", "ts" : ObjectId("54b4b1ff7d941a7cc774cac3"), "process" : "kev1-0:37017:1421128191:1804289383", "when" : ISODate("2015-01-13T05:49:51.669Z"), "why" : "upgrading config database to new format v5" } { "_id" : "balancer", "state" : 2, "who" : "kev1-0:37017:1421128191:1804289383:Balancer:1681692777", "ts" : ObjectId("54b4b5337d941a7cc774cb56"), "process" : "kev1-0:37017:1421128191:1804289383", "when" : ISODate("2015-01-13T06:03:31.442Z"), "why" : "doing balance round" } { "_id" : "test.test", "state" : 2, "who" : "kev1-0:38000:1421129010:1611849556:conn23:745631594", "ts" : ObjectId("54b4b53363588f4e90ed5a12"), "process" : "kev1-0:38000:1421129010:1611849556", "when" : ISODate("2015-01-13T06:03:31.451Z"), "why" : "migrate-{ _id: MinKey }" } mongos> printjson(ISODate()); db.getSiblingDB("config").locks.find() ISODate("2015-01-14T01:41:32.572Z") { "_id" : "configUpgrade", "state" : 0, "who" : "kev1-0:37017:1421128191:1804289383:mongosMain:846930886", "ts" : ObjectId("54b4b1ff7d941a7cc774cac3"), "process" : "kev1-0:37017:1421128191:1804289383", "when" : ISODate("2015-01-13T05:49:51.669Z"), "why" : "upgrading config database to new format v5" } { "_id" : "balancer", "state" : 2, "who" : "kev1-0:37017:1421128191:1804289383:Balancer:1681692777", "ts" : ObjectId("54b541f87d941a7cc774cb57"), "process" : "kev1-0:37017:1421128191:1804289383", "when" : ISODate("2015-01-13T16:04:08.907Z"), "why" : "doing balance round" } { "_id" : "test.test", "state" : 2, "who" : "kev1-0:38000:1421129010:1611849556:conn23:745631594", "ts" : ObjectId("54b541f863588f4e90ed5a15"), "process" : "kev1-0:38000:1421129010:1611849556", "when" : ISODate("2015-01-13T16:04:08.916Z"), "why" : "migrate-{ _id: MinKey }" } mongos> printjson(ISODate()); db.getSiblingDB("config").locks.find() ISODate("2015-01-14T02:53:25.690Z") { "_id" : "configUpgrade", "state" : 0, "who" : "kev1-0:37017:1421128191:1804289383:mongosMain:846930886", "ts" : ObjectId("54b4b1ff7d941a7cc774cac3"), "process" : "kev1-0:37017:1421128191:1804289383", "when" : ISODate("2015-01-13T05:49:51.669Z"), "why" : "upgrading config database to new format v5" } { "_id" : "balancer", "state" : 2, "who" : "kev1-0:37017:1421128191:1804289383:Balancer:1681692777", "ts" : ObjectId("54b5cebd7d941a7cc774cb58"), "process" : "kev1-0:37017:1421128191:1804289383", "when" : ISODate("2015-01-14T02:04:45.736Z"), "why" : "doing balance round" } { "_id" : "test.test", "state" : 2, "who" : "kev1-0:38000:1421129010:1611849556:conn23:745631594", "ts" : ObjectId("54b5cebd63588f4e90ed5a18"), "process" : "kev1-0:38000:1421129010:1611849556", "when" : ISODate("2015-01-14T02:04:45.746Z"), "why" : "migrate-{ _id: MinKey }" }
NOTE the when field for the balancer and collection locks changes every 10 hours when the migration fails and restarts.
- depends on
-
SERVER-16880 Timer overflows at approximately 9223 seconds
- Closed