Server - AWS EC2 t2.small, CentOS Linux release 7.2.1511 (Core)
DB Version - MongoDB community 3.2.6
Topology: Replica Set
When running mongodump command on the secondary node, the db crashes in 10% of the times.
I stressed the memory to reproduce, and looks like it happens due to low memory (though I have enough free memory when running/during the dump).
The line when dump fails:
2016-06-22T12:19:18.871+0300 Failed: error reading collection: EOF && Mux ending but selectCases still open 2
The last messages in mongod.log:
2016-06-22T12:19:13.162+0300 D STORAGE [conn10] WT begin_transaction 2016-06-22T12:19:13.187+0300 D QUERY [conn12] Running getMore, cursorid: 225913259574 2016-06-22T12:19:13.187+0300 D STORAGE [conn12] WT begin_transaction 2016-06-22T12:19:13.224+0300 D STORAGE [WTJournalFlusher] flushed journal 2016-06-22T12:19:13.324+0300 D STORAGE [WTJournalFlusher] flushed journal 2016-06-22T12:19:13.394+0300 D EXECUTOR [ReplicationExecutor] Scheduling remote request: RemoteCommand 115 -- target:production-app03:27017 db:admin cmd:{ replSetHeartbeat: "prod", configVersion: 3, from: "production-db02:27017", fromId: 1, term: 5 } 2016-06-22T12:19:13.394+0300 D ASIO [ReplicationExecutor] startCommand: RemoteCommand 115 -- target:production-app03:27017 db:admin expDate:2016-06-22T12:19:23.393+0300 cmd:{ replSetHeartbeat: "prod", configVersion: 3, from: "production-db02:27017", fromId: 1, term: 5 } 2016-06-22T12:19:13.394+0300 D ASIO [NetworkInterfaceASIO-Replication-0] Initiating asynchronous command: RemoteCommand 115 -- target:production-app03:27017 db:admin expDate:2016-06-22T12:19:23.393+0300 cmd:{ replSetHeartbeat: "prod", configVersion: 3, from: "production-db02:27017", fromId: 1, term: 5 } 2016-06-22T12:19:13.394+0300 D ASIO [NetworkInterfaceASIO-Replication-0] Starting asynchronous command 115 on host production-app03:27017 2016-06-22T12:19:13.396+0300 D EXECUTOR [NetworkInterfaceASIO-Replication-0] Received remote response: RemoteResponse -- cmd:{ ok: 1.0, state: 7, v: 3, hbmsg: "", set: "prod", term: 5, primaryId: 0, durableOpTime: { ts: Timestamp 1466525772000|2012, t: 5 }, opTime: { ts: Timestamp 1466525772000|2012, t: 5 } } 2016-06-22T12:19:13.396+0300 D ASIO [NetworkInterfaceASIO-Replication-0] Failed to time operation 115 out: Operation aborted. 2016-06-22T12:19:13.396+0300 D REPL [ReplicationExecutor] setUpValues: heartbeat response good for member _id:2, msg: 2016-06-22T12:19:13.396+0300 D REPL [ReplicationExecutor] Scheduling heartbeat to production-app03:27017 at 2016-06-22T09:19:18.396Z 2016-06-22T12:19:13.415+0300 D EXECUTOR [ReplicationExecutor] Scheduling remote request: RemoteCommand 117 -- target:production-db01:27017 db:admin cmd:{ replSetHeartbeat: "prod", configVersion: 3, from: "production-db02:27017", fromId: 1, term: 5 } 2016-06-22T12:19:13.415+0300 D ASIO [ReplicationExecutor] startCommand: RemoteCommand 117 -- target:production-db01:27017 db:admin expDate:2016-06-22T12:19:23.415+0300 cmd:{ replSetHeartbeat: "prod", configVersion: 3, from: "production-db02:27017", fromId: 1, term: 5 } 2016-06-22T12:19:13.415+0300 D ASIO [NetworkInterfaceASIO-Replication-0] Initiating asynchronous command: RemoteCommand 117 -- target:production-db01:27017 db:admin expDate:2016-06-22T12:19:23.415+0300 cmd:{ replSetHeartbeat: "prod", configVersion: 3, from: "production-db02:27017", fromId: 1, term: 5 } 2016-06-22T12:19:13.415+0300 D ASIO [NetworkInterfaceASIO-Replication-0] Starting asynchronous command 117 on host production-db01:27017 2016-06-22T12:19:13.417+0300 D EXECUTOR [NetworkInterfaceASIO-Replication-0] Received remote response: RemoteResponse -- cmd:{ ok: 1.0, electionTime: new Date(6293366302669012993), state: 1, v: 3, hbmsg: "", set: "prod", term: 5, primaryId: 0, durableOpTime: { ts: Timestamp 1466525772000|2012, t: 5 }, opTime: { ts: Timestamp 1466525772000|2012, t: 5 } } 2016-06-22T12:19:13.417+0300 D ASIO [NetworkInterfaceASIO-Replication-0] Failed to time operation 117 out: Operation aborted. 2016-06-22T12:19:13.417+0300 D REPL [ReplicationExecutor] Canceling election timeout callback at 2016-06-22T12:19:19.650+0300 2016-06-22T12:19:13.417+0300 D REPL [ReplicationExecutor] Scheduling election timeout callback at 2016-06-22T12:19:24.755+0300 2016-06-22T12:19:13.833+0300 D REPL [ReplicationExecutor] setUpValues: heartbeat response good for member _id:0, msg: 2016-06-22T12:19:13.833+0300 D REPL [ReplicationExecutor] Scheduling heartbeat to production-db01:27017 at 2016-06-22T09:19:18.417Z 2016-06-22T12:19:13.833+0300 D STORAGE [WTJournalFlusher] flushed journal 2016-06-22T12:19:13.833+0300 D STORAGE [conn13] WT rollback_transaction 2016-06-22T12:19:13.833+0300 D STORAGE [conn13] WT begin_transaction 2016-06-22T12:19:13.846+0300 D STORAGE [conn10] WT rollback_transaction 2016-06-22T12:19:13.846+0300 D STORAGE [conn10] WT begin_transaction 2016-06-22T12:19:13.846+0300 D STORAGE [conn12] WT rollback_transaction 2016-06-22T12:19:13.846+0300 D STORAGE [conn12] WT begin_transaction 2016-06-22T12:19:13.846+0300 D STORAGE [conn12] WT rollback_transaction 2016-06-22T12:19:13.846+0300 D STORAGE [conn12] WT begin_transaction 2016-06-22T12:19:13.859+0300 D STORAGE [conn12] WT rollback_transaction 2016-06-22T12:19:13.859+0300 D STORAGE [conn12] WT begin_transaction 2016-06-22T12:19:13.859+0300 D STORAGE [conn13] WT rollback_transaction 2016-06-22T12:19:13.859+0300 D STORAGE [conn13] WT begin_transaction