Mongos always crashed!!! And 5 mongos crashed almost at the same time. The reason is that it "got not master for: 192.168.99.1", then "DBClientCursor::init call() failed" and it received signal 11.
The version is 2.2.0.
This bug is similar to SERVER-6539: https://jira.mongodb.org/browse/SERVER-6539
Backtraces below:
— 1 —
Thu Sep 13 13:57:24 [ReplicaSetMonitorWatcher] Primary for replica set shard01 changed to SNode_S023:2012
Thu Sep 13 13:57:34 [ReplicaSetMonitorWatcher] Primary for replica set shard01 changed to SNode_S021:2012
Thu Sep 13 13:57:34 [ReplicaSetMonitorWatcher] Primary for replica set shard01 changed to SNode_S023:2012
Thu Sep 13 13:57:44 [ReplicaSetMonitorWatcher] Primary for replica set shard01 changed to SNode_S021:2012
Thu Sep 13 13:57:44 [ReplicaSetMonitorWatcher] Primary for replica set shard01 changed to SNode_S023:2012
Thu Sep 13 13:57:54 [ReplicaSetMonitorWatcher] Primary for replica set shard01 changed to SNode_S021:2012
Thu Sep 13 13:57:54 [ReplicaSetMonitorWatcher] Primary for replica set shard01 changed to SNode_S023:2012
Thu Sep 13 13:57:54 [WriteBackListener-SNode_S023:2012] DBClientCursor::init call() failed
Thu Sep 13 13:57:54 [WriteBackListener-SNode_S023:2012] WriteBackListener exception : DBClientBase::findN: transport error: SNode_S023:2012 ns: admin.$cmd query:
Thu Sep 13 13:57:55 [conn584] ChunkManager: time to load chunks for infodb.docinfo: 132ms sequenceNumber: 3330 version: 3420|1||504836f4ed66ab254ec61a1e based on: 3419|5||504836f4ed66ab254ec61a1e
Thu Sep 13 13:57:55 [conn589] ChunkManager: time to load chunks for textdb.doctext: 212ms sequenceNumber: 3331 version: 2856|3||504836f4ed66ab254ec61a1f based on: 2856|1||504836f4ed66ab254ec61a1f
Thu Sep 13 13:57:55 [conn589] got not master for: SNode_S023:2012
Thu Sep 13 13:57:55 [conn458] ChunkManager: time to load chunks for infodb.docinfo: 109ms sequenceNumber: 3332 version: 3420|1||504836f4ed66ab254ec61a1e based on: 3419|5||504836f4ed66ab254ec61a1e
Received signal 11
Backtrace: 0x8386d5 0x361f632920
./mongos(_ZN5mongo17printStackAndExitEi+0x75)[0x8386d5]
/lib64/libc.so.6[0x361f632920]
===
— /1 —
and
— 2 —
Thu Sep 13 14:27:01 [conn5] ChunkManager: time to load chunks for textdb.doctext: 181ms sequenceNumber: 529 version: 2856|185||504836f4ed66ab254ec61a1f based on: 2856|47||504836f4ed66ab254ec61a1f
Thu Sep 13 14:27:02 [conn8] Socket recv() errno:104 Connection reset by peer 10.9.0.23:2012
Thu Sep 13 14:27:02 [WriteBackListener-SNode_S023:2012] DBClientCursor::init call() failed
Thu Sep 13 14:27:02 [conn8] SocketException: remote: 10.9.0.23:2012 error: 9001 socket exception [1] server [10.9.0.23:2012]
Thu Sep 13 14:27:02 [conn8] DBClientCursor::init call() failed
Thu Sep 13 14:27:02 [WriteBackListener-SNode_S023:2012] WriteBackListener exception : DBClientBase::findN: transport error: SNode_S023:2012 ns: admin.$cmd query:
Thu Sep 13 14:27:02 [conn8] warning: db exception when initializing on shard01:shard01/SNode_S021:2012,SNode_S022:2012,SNode_S023:2012, current connection state is { state:
{ conn: "shard01/SNode_S021:2012,SNode_S022:2012,SNode_S023:2012", vinfo: "textdb.doctext @ 2856|185||504836f4ed66ab254ec61a1f", cursor: "(none)", count: 0, done: false }, retryNext: false, init: false, finish: false, errored: false } :: caused by :: 10276 DBClientBase::findN: transport error: SNode_S023:2012 ns: admin.$cmd query: { setShardVersion: "textdb.doctext", configdb: "SNode_S038:2020,SNode_S039:2020,SNode_S040:2020", version: Timestamp 2856000|173, versionEpoch: ObjectId('504836f4ed66ab254ec61a1f'), serverID: ObjectId('5051795ec69e943c6fb769f9'), shard: "shard01", shardHost: "shard01/SNode_S021:2012,SNode_S022:2012,SNode_S023:2012", $auth: {} }
Thu Sep 13 14:27:02 [conn29] got not master for: SNode_S023:2012
Thu Sep 13 14:27:02 [conn3] Primary for replica set shard01 changed to SNode_S021:2012
Received signal 11
Backtrace: 0x8386d5 0x361f632920 0xc61e30
./mongos(_ZN5mongo17printStackAndExitEi+0x75)[0x8386d5]
/lib64/libc.so.6[0x361f632920]
./mongos(_ZTVN5mongo18DBClientConnectionE+0x10)[0xc61e30]
===
— /2 —
and
— 3 —
Tue Sep 18 14:36:38 [WriteBackListener-SNode_S029:2012] Socket recv() errno:104 Connection reset by peer 10.9.0.29:2012
Tue Sep 18 14:36:38 [WriteBackListener-SNode_S029:2012] SocketException: remote: 10.9.0.29:2012 error: 9001 socket exception [1] server [10.9.0.29:2012]
Tue Sep 18 14:36:38 [WriteBackListener-SNode_S029:2012] DBClientCursor::init call() failed
Tue Sep 18 14:36:38 [WriteBackListener-SNode_S029:2012] WriteBackListener exception : DBClientBase::findN: transport error: SNode_S029:2012 ns: admin.$cmd query:
Tue Sep 18 14:36:38 [mongosMain] connection accepted from 10.9.0.1:38044 #2861 (1062 connections now open)
Tue Sep 18 14:36:38 [conn2861] got not master for: SNode_S029:2012
Received signal 11
Backtrace: 0x8386d5 0x361f632920 0x7f665e428d80
./mongos(_ZN5mongo17printStackAndExitEi+0x75)[0x8386d5]
/lib64/libc.so.6[0x361f632920]
[0x7f665e428d80]
===
— /3 —
- duplicates
-
SERVER-7061 mongos can use invalid ptr to master conn when setShardVersion fails
- Closed