-
Type: Bug
-
Resolution: Duplicate
-
Priority: Major - P3
-
None
-
Affects Version/s: 2.0.3
-
Component/s: Sharding
-
None
-
Environment:Ubuntu on EC2
-
ALL
I recently dropped a sharded collection, recreated it, and re-sharded it. Seems like mongos doesn't know how to handle that. Restarting mongos reloads the config and thus fixes the problem, but this seems to me like a bug.
On the mongos logs I see these messages flying by at a high rate:
Fri Jun 15 18:16:39 [WriteBackListener-mongo3.foobar.com:27018] writeback failed because of stale config, retrying attempts: 16678
Fri Jun 15 18:16:39 [conn4784] ChunkManager: time to load chunks for pb3.hourly_stats: 2ms sequenceNumber: 54284 version: 1|0
Fri Jun 15 18:16:39 [WriteBackListener-mongo3.foobar.com:27018] created new distributed lock for pb3.hourly_stats on mongoconfig1.foobar.com:27019,mongoconfig2.foobar.com:27019,mongoconfig3.foobar.com:27019 ( lock timeout : 900000, ping interval : 30000, process : 0 )
Fri Jun 15 18:16:39 [WriteBackListener-mongo2.foobar.com:27018] writeback failed because of stale config, retrying attempts: 17967
Fri Jun 15 18:16:39 [WriteBackListener-mongo3.foobar.com:27018] ChunkManager: time to load chunks for pb3.hourly_stats: 2ms sequenceNumber: 54285 version: 1|0
Fri Jun 15 18:16:39 [WriteBackListener-mongo2.foobar.com:27018] created new distributed lock for pb3.hourly_stats on mongoconfig1.foobar.com:27019,mongoconfig2.foobar.com:27019,mongoconfig3.foobar.com:27019 ( lock timeout : 900000, ping interval : 30000, process : 0 )
Fri Jun 15 18:16:39 [WriteBackListener-mongo2.foobar.com:27018] ChunkManager: time to load chunks for pb3.hourly_stats: 1ms sequenceNumber: 54286 version: 1|0
Fri Jun 15 18:16:39 [conn4776] created new distributed lock for pb3.hourly_stats on mongoconfig1.foobar.com:27019,mongoconfig2.foobar.com:27019,mongoconfig3.foobar.com:27019 ( lock timeout : 900000, ping interval : 30000, process : 0 )
Fri Jun 15 18:16:39 [conn4784] setShardVersion failed host: mongo2.foobar.com:27018
Fri Jun 15 18:16:39 [conn4784] Assertion: 10429:setShardVersion failed host: mongo2.foobar.com:27018
{ oldVersion: Timestamp 0|0, ns: "pb3.hourly_stats", version: Timestamp 448000|41, globalVersion: Timestamp 0|0, errmsg: "client version differs from config's for collection 'pb3.hourly_stats'", ok: 0.0 }0x5350c2 0x7f5f95 0x7f5790
mongos(_ZN5mongo11msgassertedEiPKc+0x112) [0x5350c2]
mongos() [0x7f5f95]
mongos() [0x7f5790]
Fri Jun 15 18:16:39 [conn4784] ~ScopedDBConnection: _conn != null
Fri Jun 15 18:16:39 [conn4784] AssertionException while processing op type : 2002 to : pb3.hourly_stats :: caused by :: 10429 setShardVersion failed host: mongo2.foobar.com:27018
Fri Jun 15 18:16:39 [WriteBackListener-mongo3.foobar.com:27018] writeback failed because of stale config, retrying attempts: 16679
Fri Jun 15 18:16:39 [conn4776] ChunkManager: time to load chunks for pb3.hourly_stats: 2ms sequenceNumber: 54287 version: 1|0
Fri Jun 15 18:16:39 [conn4783] created new distributed lock for pb3.hourly_stats on mongoconfig1.foobar.com:27019,mongoconfig2.foobar.com:27019,mongoconfig3.foobar.com:27019 ( lock timeout : 900000, ping interval : 30000, process : 0 )
On the non-primary shards I see these messages flying by at a high rate:
Fri Jun 15 18:31:17 [conn28000] no chunk for collection pb3.hourly_stats on shard shard0002
Fri Jun 15 18:31:17 [conn28001] no chunk for collection pb3.hourly_stats on shard shard0002
Fri Jun 15 18:31:17 [conn27998] no chunk for collection pb3.hourly_stats on shard shard0002
Fri Jun 15 18:31:17 [conn27998] end connection xxx.xxx.xxx.xxx:48064
Fri Jun 15 18:31:17 [initandlisten] connection accepted from xxx.xxx.xxx.xxx:51367 #28002
Fri Jun 15 18:31:17 [conn27999] no chunk for collection pb3.hourly_stats on shard shard0002
Fri Jun 15 18:31:17 [conn27999] end connection xxx.xxx.xxx.xxx:48065
Fri Jun 15 18:31:17 [initandlisten] connection accepted from xxx.xxx.xxx.xxx:51368 #28003
Fri Jun 15 18:31:17 [conn28002] no chunk for collection pb3.hourly_stats on shard shard0002
Fri Jun 15 18:31:17 [conn28003] no chunk for collection pb3.hourly_stats on shard shard0002
Fri Jun 15 18:31:17 [conn28001] no chunk for collection pb3.hourly_stats on shard shard0002
Fri Jun 15 18:31:17 [conn28000] no chunk for collection pb3.hourly_stats on shard shard0002
Fri Jun 15 18:31:17 [conn28000] end connection xxx.xxx.xxx.xxx:46821
Fri Jun 15 18:31:17 [initandlisten] connection accepted from xxx.xxx.xxx.xxx:46825 #28004
Fri Jun 15 18:31:17 [conn28002] no chunk for collection pb3.hourly_stats on shard shard0002
Fri Jun 15 18:31:17 [conn28003] no chunk for collection pb3.hourly_stats on shard shard0002
Fri Jun 15 18:31:17 [conn28004] no chunk for collection pb3.hourly_stats on shard shard0002
Fri Jun 15 18:31:17 [conn28001] no chunk for collection pb3.hourly_stats on shard shard0002
Fri Jun 15 18:31:17 [conn28001] end connection xxx.xxx.xxx.xxx:46823
Fri Jun 15 18:31:17 [initandlisten] connection accepted from xxx.xxx.xxx.xxx:46826 #28005
Fri Jun 15 18:31:17 [conn28002] no chunk for collection pb3.hourly_stats on shard shard0002
Fri Jun 15 18:31:17 [conn28004] no chunk for collection pb3.hourly_stats on shard shard0002
Fri Jun 15 18:31:17 [conn28003] no chunk for collection pb3.hourly_stats on shard shard0002
And on the primary shard I see a lot of connections being opened and closed, but nothing else.
- duplicates
-
SERVER-4537 better protect all sharding admin operations from simultaneous commands
- Closed