Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-68124

Primary replica member hangs during chunk migrating

    • Type: Icon: Bug Bug
    • Resolution: Done
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: 4.2.17
    • Component/s: None
    • Environment:
      Ubuntu 16.04
      XSF
      Kernel - 4.4.0-1128-aws #142-Ubuntu SMP Fri Apr 16 12:42:33 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
      Disable Transparent Huge disabled
      AWS m5.large (2cpu\8gb)
      SSD GP3 450 Gb
      monogo-org-server - 4.2.17
    • ALL

      We're using `sh.addTagRange` to set custom chunk ranges. And usually the ranges are fixed and the chunks are not moved between the shards. However, the other day we needed to change the ranges which made the balancer move chunks. Within 15-20 minutes afterwards the primary of some of the shards got unresponsive and the whole shard cluster hang.

      diagnostic.data.zip of the hang primary is attached.

      Also, I found this message in the logs

      STORAGE  [FlowControlRefresher] Flow control is engaged and the sustainer point is not moving. Please check the health of all secondaries.

       

      which led me to this issue. Might be related.

        1. image-2022-09-09-05-44-22-482.png
          image-2022-09-09-05-44-22-482.png
          303 kB
        2. diagnostic.data.zip
          48.37 MB

            Assignee:
            chris.kelly@mongodb.com Chris Kelly
            Reporter:
            vladimirred456@gmail.com Vladimir Beliakov
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated:
              Resolved: