Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-26780

SyncTail::getMissingDoc() should retry on SocketExceptions

    • Type: Icon: Bug Bug
    • Resolution: Done
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: 3.2.12, 3.4.2, 3.5.2
    • Component/s: Replication
    • Environment:
      ubuntu, mongo 3.2.10
    • Replication
    • ALL

      A secondary is failing to perform the initial sync with another secondary to join a replica set.

      It fails due to a socket receive timeout when talking to the other secondary during the initial sync.

      I have attached the final lines of the log from the secondary trying to join the replica set.

      NB: we never see any "network problem detected" lines in our logs, so it seems as if there is never any retries:
      https://github.com/mongodb/mongo/blob/r3.2.10/src/mongo/db/repl/sync_tail.cpp#L968-L969

      I think the SocketException due to the timeout is being caught earlier:
      https://github.com/mongodb/mongo/blob/r3.2.10/src/mongo/util/net/message_port.cpp#L204-L210
      which then triggers the assertion exception
      https://github.com/mongodb/mongo/blob/r3.2.10/src/mongo/client/dbclient.cpp#L811-L814

      I do not believe the fix in https://jira.mongodb.org/browse/SERVER-9528 was correct due to the exception swallowing.

        1. mms-mongo-1-110.log
          1 kB
        2. mms-mongo-1-106.log
          5 kB

            Assignee:
            backlog-server-repl [DO NOT USE] Backlog - Replication Team
            Reporter:
            rob.clancy@intercom.io Rob Clancy
            Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

              Created:
              Updated:
              Resolved: