Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-16396

Replication stall, then one secondary would not shut down (mmapv1)

    • Type: Icon: Task Task
    • Resolution: Duplicate
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: 2.8.0-rc1
    • Component/s: Replication
    • None

      Please see attached graphs showing behavior on our 2.8.0rc1 (mmapv1) replica set.

      We experienced the following series of events:

      • Rapidly climbing replication lag on both secondaries. Observed IOPS on the secondaries was very high.
      • Getmore counter dropped off to zero on the primary
      • Restarted one secondary (onprem-2). On restart, its replication lag fell off immediately back down to zero.
      • Getmore counter on primary started looking more normal
      • Attempted to shutdown the other secondary (onprem-3). It would not shutdown. gdb dump attached.
      • After hard killing the other secondary and restarting it's replication lag also fell off to zero.

      Will link to logs for all nodes.

        1. mms-on-prem-3.backtrace
          17 kB
        2. onprem-2.png
          onprem-2.png
          134 kB
        3. onprem.png
          onprem.png
          139 kB

            Assignee:
            schwerin@mongodb.com Andy Schwerin
            Reporter:
            cailin.nelson@mongodb.com Cailin Nelson (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

              Created:
              Updated:
              Resolved: