Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-17019

HA setup doesn't work if member totally and quickly disappears

    • Type: Icon: Question Question
    • Resolution: Incomplete
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: 2.6.6
    • Component/s: Replication
    • None

      We have a problem with our replica set. It's running on three virtual servers and if any of the mongod's goes down, it normally continues working with the rest. However, if any of the servers totally disappears, i.e. won't respond to network traffic at all (if down, or block all outgoing traffic via firewall, or poweroff the server suddenly), all queries to the replica set take 15 seconds extra. Judging from the network traffic, it's due to TCP retransmits.

      This 15 second extra time for every query makes our load balancer think all nodes are down and it shuts down traffic to the whole setup.

      Since using console mongo the other replica set members works fine, we originally posted this as a bug in the node.js driver (https://jira.mongodb.org/browse/NODE-350), but later tried with the PHP driver and were able to reproduce a similar (although not identical) behaviour.

      We also reproduced this problem in our secondary setup in another data center, so this shouldn't be data center specific. Both might be running the same virtualization platform, though, we haven't looked into that yet.

      Any ideas how to go forward with this?

            Assignee:
            schwerin@mongodb.com Andy Schwerin
            Reporter:
            kvirta Kalle Varisvirta
            Votes:
            1 Vote for this issue
            Watchers:
            10 Start watching this issue

              Created:
              Updated:
              Resolved: