Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-94800

Unclean shutdown during resharding can lead to incorrect "oplogEntriesFetched" metric and ReshardingCriticalSectionTimeout error

    • Cluster Scalability
    • Fully Compatible
    • ALL
    • Cluster Scalability 2024-11-11

      If the primary of a recipient shard goes through unclean shutdown during the "cloning" state and steps up again after restarting, the restored "oplogEntriesFetched" metric can be incorrect since upon recovery it is set to the sum of fast counts on the config.localReshardingOplogBuffer.<collUUID>.<donorShardId> collections and fast counts by design can be incorrect after unclean shutdown. Having an incorrect "oplogEntriesFetched" metric leads to incorrect remaining time estimates and can make resharding enter the critical section too early and hit ReshardingCriticalSectionTimeout error.

            Assignee:
            cheahuychou.mao@mongodb.com Cheahuychou Mao
            Reporter:
            cheahuychou.mao@mongodb.com Cheahuychou Mao
            Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

              Created:
              Updated:
              Resolved: