-
Type: Bug
-
Resolution: Done
-
Priority: Critical - P2
-
None
-
Affects Version/s: None
-
Component/s: Replication
-
Replication
-
ALL
This results in liveness property of leader election being lost, i.e. a new master is never elected.
Relatively easy way to trigger:
1) Spawn a replicaset locally, replicate.py in https://github.com/dcci/mongo-replication-perf can be used for this.
2) Once a primary is elected, drop all incoming local connections directed to it. Assuming the primary listening on 30001 this should be enough (on Linux, or whatever flavour of *NIX that supports iptables).
# iptables -A INPUT -j DROP -p tcp -i lo --destination-port 30001 # iptables -A INPUT -j DROP -p tcp --destination-port 30001
Secondaries still receive heartbeats from primary so they don't change, as the log says.
2014-06-09T11:45:47.792-0700 [rsHealthPoll] warning: Failed to connect to 127.0.0.1:30001 after 5000 milliseconds, giving up. 2014-06-09T11:45:47.792-0700 [rsHealthPoll] replset info localhost:30001 heartbeat failed, retrying 2014-06-09T11:45:50.698-0700 [rsBackgroundSync] replSet not trying to sync from localhost:30001, it is vetoed for 5 more seconds 2014-06-09T11:45:50.698-0700 [rsBackgroundSync] replSet not trying to sync from localhost:30001, it is vetoed for 5 more seconds 2014-06-09T11:45:52.793-0700 [rsHealthPoll] warning: Failed to connect to 127.0.0.1:30001, reason: errno:115 Operation now in progress 2014-06-09T11:45:52.793-0700 [rsHealthPoll] replset info localhost:30001 just heartbeated us, but our heartbeat failed: , not changing state 2014-06-09T11:45:55.698-0700 [rsBackgroundSync] replSet not trying to sync from localhost:30001, it is vetoed for 0 more seconds 2014-06-09T11:45:55.698-0700 [rsBackgroundSync] replSet not trying to sync from localhost:30001, it is vetoed for 0 more seconds 2014-06-09T11:45:59.069-0700 [conn46] end connection 127.0.0.1:52528 (1 connection now open) 2014-06-09T11:45:59.069-0700 [initandlisten] connection accepted from 127.0.0.1:52545 #48 (2 connections now open) 2014-06-09T11:45:59.835-0700 [rsHealthPoll] warning: Failed to connect to 127.0.0.1:30001 after 5000 milliseconds, giving up.