Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-16818

Add socket timeout to isSelf replication check

    • Type: Icon: Bug Bug
    • Resolution: Done
    • Priority: Icon: Minor - P4 Minor - P4
    • 3.0.0-rc6
    • Affects Version/s: 2.6.6, 2.8.0-rc4
    • Component/s: Replication
    • None
    • Fully Compatible
    • ALL
    • Hide
      1. Spin up a 2 node replica set
      2. Send SIGSTOP to one node
      3. Make sure the other one steps down to SECONDARY
      4. rs.status works and should show 1 SECONDARY, 1 "(not reachable/healthy)"
      5. Shut down the node in SECONDARY and then restart the process
      6. Try to issue rs.status(); output is
        > rs.status()
        {
        	"startupStatus" : 1,
        	"ok" : 0,
        	"errmsg" : "loading local.system.replset config (LOADINGCONFIG)"
        }
        
      7. The socket seems to never time out (3 hours and counting)
      Show
      Spin up a 2 node replica set Send SIGSTOP to one node Make sure the other one steps down to SECONDARY rs.status works and should show 1 SECONDARY, 1 "(not reachable/healthy)" Shut down the node in SECONDARY and then restart the process Try to issue rs.status() ; output is > rs.status() { "startupStatus" : 1, "ok" : 0, "errmsg" : "loading local.system.replset config (LOADINGCONFIG)" } The socket seems to never time out (3 hours and counting)

      When a mongod starts with --replSet and finds a config in local.system.replset, it will try to establish connections to the other replica set members. It seems that these initial connection attempts are not timed out, which means there is a possibility we might be hung forever waiting for a response from a down replica set member.

      By contrast, when an existing up replset member discovers a new replica set member (via rs.add) but the new member is actually uncontactable, the existing member will timeout the connection attempt. This ticket is to request that the initial connection attempts are timed out in the same way.

      In the repo given, prior to restarting the mongod, this node is in SECONDARY. It should be able to resume becoming SECONDARY after being restarted.

      Note: Adding a third node fixes this problem, it seems we only need a majority of members contacted for the config load to succeed.

            Assignee:
            scotthernandez Scott Hernandez (Inactive)
            Reporter:
            joanna.cheng@mongodb.com Joanna Cheng
            Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

              Created:
              Updated:
              Resolved: