-
Type: Bug
-
Resolution: Fixed
-
Priority: Major - P3
-
Affects Version/s: 3.6.0
-
Component/s: Write Operations
-
None
On sharded cluster of MongoDB, version 3.6.1~rc1, we start to encounter connection drops from java services to mongos on bulk inserts. It happens on rare sets of data that we insert. At the moment we're trying to generate set of data to get drops, can't give work data because of NDA.
We use latest java driver 3.6.1 and sessions.
It happens independent on connection compression.
Looks like sometimes message length of batch is slightly bigger than max allowed - 48000000, and it leads to connection to server closing with error. Message size is different. Max that we've seen with these error was 48000021.
I'm not sure if it was on 3.6.0 version, because we coudn't use it because of SERVER-32372 bug.
Log from mongos (first batch was sent ok and on second it drops):
2017-12-25T19:14:49.608+0300 D SHARDING [conn5] Starting execution of write batch of size 44029 for database.collection 2017-12-25T19:14:49.763+0300 D SHARDING [conn5] warning: log line attempted (15501kB) over max size (10kB), printing beginning and end ... Sending write batch to sh1-rs1: { insert: "collection", bypassDocumentValidation: false, ordered: true, documents: [ ... ], shardVersion: [ Timestamp(0, 0), ObjectId('000000000000000000000000') ], lsid: { id: UUID("cca374f0-ec56-4c75-b4ce-d1fd597fd562"), uid: BinData(0, 4E66BD8E1B485D93D098867F94DCC1E595ADC4580AAD3E1CDAFE7858E889C8E6) } } 2017-12-25T19:14:49.877+0300 D EXECUTOR [conn5] warning: log line attempted (15501kB) over max size (10kB), printing beginning and end ... Scheduling remote command request: RemoteCommand 100 -- target:sh1-rs1-2:27041 db:database cmd:{ insert: "collection", bypassDocumentValidation: false, ordered: true, documents: [ ... ], shardVersion: [ Timestamp(0, 0), ObjectId('000000000000000000000000') ], lsid: { id: UUID("cca374f0-ec56-4c75-b4ce-d1fd597fd562"), uid: BinData(0, 4E66BD8E1B485D93D098867F94DCC1E595ADC4580AAD3E1CDAFE7858E889C8E6) } } 2017-12-25T19:14:49.991+0300 D ASIO [conn5] warning: log line attempted (15501kB) over max size (10kB), printing beginning and end ... startCommand: RemoteCommand 100 -- target:sh1-rs1-2:27041 db:database cmd:{ insert: "collection", bypassDocumentValidation: false, ordered: true, documents: [ ... ], shardVersion: [ Timestamp(0, 0), ObjectId('000000000000000000000000') ], lsid: { id: UUID("cca374f0-ec56-4c75-b4ce-d1fd597fd562"), uid: BinData(0, 4E66BD8E1B485D93D098867F94DCC1E595ADC4580AAD3E1CDAFE7858E889C8E6) } } 2017-12-25T19:14:50.606+0300 D SHARDING [conn5] Write results received from sh1-rs1-2:27041: { ok: 1, n: 15136, opTime: { ts: Timestamp(1514218490, 15136), t: 2 }, electionId: ObjectId('7fffffff0000000000000002') } 2017-12-25T19:14:50.738+0300 D SHARDING [conn5] warning: log line attempted (15527kB) over max size (10kB), printing beginning and end ... Sending write batch to sh1-rs1: { insert: "collection", bypassDocumentValidation: false, ordered: true, documents: [ ... ], shardVersion: [ Timestamp(0, 0), ObjectId('000000000000000000000000') ], lsid: { id: UUID("cca374f0-ec56-4c75-b4ce-d1fd597fd562"), uid: BinData(0, 4E66BD8E1B485D93D098867F94DCC1E595ADC4580AAD3E1CDAFE7858E889C8E6) } } 2017-12-25T19:14:50.853+0300 D EXECUTOR [conn5] warning: log line attempted (15527kB) over max size (10kB), printing beginning and end ... Scheduling remote command request: RemoteCommand 104 -- target:sh1-rs1-2:27041 db:database cmd:{ insert: "collection", bypassDocumentValidation: false, ordered: true, documents: [ ... ], shardVersion: [ Timestamp(0, 0), ObjectId('000000000000000000000000') ], lsid: { id: UUID("cca374f0-ec56-4c75-b4ce-d1fd597fd562"), uid: BinData(0, 4E66BD8E1B485D93D098867F94DCC1E595ADC4580AAD3E1CDAFE7858E889C8E6) } } 2017-12-25T19:14:50.969+0300 D ASIO [conn5] warning: log line attempted (15527kB) over max size (10kB), printing beginning and end ... startCommand: RemoteCommand 104 -- target:sh1-rs1-2:27041 db:database cmd:{ insert: "collection", bypassDocumentValidation: false, ordered: true, documents: [ ... ], shardVersion: [ Timestamp(0, 0), ObjectId('000000000000000000000000') ], lsid: { id: UUID("cca374f0-ec56-4c75-b4ce-d1fd597fd562"), uid: BinData(0, 4E66BD8E1B485D93D098867F94DCC1E595ADC4580AAD3E1CDAFE7858E889C8E6) } } 2017-12-25T19:14:51.601+0300 D SHARDING [conn5] Write results received from sh1-rs1-2:27041: { ok: 1, n: 15053, opTime: { ts: Timestamp(1514218491, 15053), t: 2 }, electionId: ObjectId('7fffffff0000000000000002') } 2017-12-25T19:14:51.728+0300 D SHARDING [conn5] warning: log line attempted (14320kB) over max size (10kB), printing beginning and end ... Sending write batch to sh1-rs1: { insert: "collection", bypassDocumentValidation: false, ordered: true, documents: [ ... ], shardVersion: [ Timestamp(0, 0), ObjectId('000000000000000000000000') ], lsid: { id: UUID("cca374f0-ec56-4c75-b4ce-d1fd597fd562"), uid: BinData(0, 4E66BD8E1B485D93D098867F94DCC1E595ADC4580AAD3E1CDAFE7858E889C8E6) } } 2017-12-25T19:14:51.826+0300 D EXECUTOR [conn5] warning: log line attempted (14320kB) over max size (10kB), printing beginning and end ... Scheduling remote command request: RemoteCommand 108 -- target:sh1-rs1-2:27041 db:database cmd:{ insert: "collection", bypassDocumentValidation: false, ordered: true, documents: [ ... ], shardVersion: [ Timestamp(0, 0), ObjectId('000000000000000000000000') ], lsid: { id: UUID("cca374f0-ec56-4c75-b4ce-d1fd597fd562"), uid: BinData(0, 4E66BD8E1B485D93D098867F94DCC1E595ADC4580AAD3E1CDAFE7858E889C8E6) } } 2017-12-25T19:14:51.923+0300 D ASIO [conn5] warning: log line attempted (14320kB) over max size (10kB), printing beginning and end ... startCommand: RemoteCommand 108 -- target:sh1-rs1-2:27041 db:database cmd:{ insert: "collection", bypassDocumentValidation: false, ordered: true, documents: [ ... ], shardVersion: [ Timestamp(0, 0), ObjectId('000000000000000000000000') ], lsid: { id: UUID("cca374f0-ec56-4c75-b4ce-d1fd597fd562"), uid: BinData(0, 4E66BD8E1B485D93D098867F94DCC1E595ADC4580AAD3E1CDAFE7858E889C8E6) } } 2017-12-25T19:14:52.573+0300 D SHARDING [conn5] Write results received from sh1-rs1-2:27041: { ok: 1, n: 13840, opTime: { ts: Timestamp(1514218492, 46224), t: 2 }, electionId: ObjectId('7fffffff0000000000000002') } 2017-12-25T19:14:52.574+0300 D SHARDING [conn5] Finished execution of write batch for database.collection 2017-12-25T19:14:52.580+0300 D SHARDING [conn5] Command end db: database msg id: 182 2017-12-25T19:14:53.621+0300 I NETWORK [conn5] recv(): message msgLen 48000014 is invalid. Min 16 Max: 48000000 2017-12-25T19:14:53.621+0300 I NETWORK [conn5] Error receiving request from client: ProtocolError: recv(): message msgLen 48000014 is invalid. Min 16 Max: 48000000. Ending connection from 10.0.0.22:56414 (connection id: 5) 2017-12-25T19:14:53.622+0300 I NETWORK [conn5] end connection 10.0.0.22:56414 (4 connections now open)