Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-19176

Mongo instance dies while creating a large replica:

    • Type: Icon: Bug Bug
    • Resolution: Duplicate
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: 2.6.10, 3.0.4
    • Component/s: Networking, Replication
    • None
    • ALL
    • Hide

      Get a huge mongo collection (3Tb), put it under an intense create/update/delete load (100+ updates sec), try to add a replica to the given replicaset, see it fail every time.

      Show
      Get a huge mongo collection (3Tb), put it under an intense create/update/delete load (100+ updates sec), try to add a replica to the given replicaset, see it fail every time.

      I'm adding a new replica to a mongo replica set with a huge ~3Tb single collection in it. The collection is very active, which means that while Mongo is seeding the data from the primary and building its primary key index, huge amounts of changes occur in the table.

      The problem occurs when this new replica starts to catch up on oplog and performs checks against the primary. It looks like Mongo creates a new connection for each document (500+ new connections a second created from the replica to the primary according to our monitoring). After some time of that intense flood of connections we hit the limit on ephemeral source ports available for use because connections spend some time in TIME_WAIT before being recycled (even with tw_reuse and tw_recycle connection rate is high enough to exhaust all the ports).

      Each time we hit the issue, Mongo receives a connection error (errno 99: Cannot assign requested address) from the kernel. There is logic in SyncTail::getMissingDoc to perform up to 3 retries before giving up and in some cases it helps – kernel manages to free up some ports and connection succeeds upon retry. But since it takes a huge amount of time to seed the servers and build the primary index (up to 24 hours), the amounts of changes Mongo needs to apply to catch up with oplog is huge, so there are many chances for it to fail all 3 retries and die with an Assertion: 15916:Can no longer connect to initial sync source message.

      At the moment I'm trying to create the replica for the 5th time (each try taking 1-2 days) and so far I've managed to catch the moment mongo started hitting the limit of 60K TIME_WAIT sockets in time to be able to run cpulimit on the process to slow it down and reduce the rate of socket creation enough so that the kernel could recycle those sockets fast enough, but this is clearly an issue that needs fixing in the code.

      Some additional resources:

            Assignee:
            ramon.fernandez@mongodb.com Ramon Fernandez Marina
            Reporter:
            kovyrin Oleksiy Kovyrin
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

              Created:
              Updated:
              Resolved: