Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Fixed
Priority: Major - P3
Fix Version/s: 8.1.0-rc0
Affects Version/s: None
Component/s: None
Labels:
- resharding-critical-section-timeout
- resharding-improvements

Assigned Teams:

Cluster Scalability
Backwards Compatibility:
Fully Compatible
Operating System:
ALL
Sprint:
Cluster Scalability 2024-11-11
Confidence Status:
None
Work Order:
3

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name:
None
Goal Link:
None

If the primary of a recipient shard goes through unclean shutdown during the "cloning" state and steps up again after restarting, the restored "oplogEntriesFetched" metric can be incorrect since upon recovery it is set to the sum of fast counts on the config.localReshardingOplogBuffer.<collUUID>.<donorShardId> collections and fast counts by design can be incorrect after unclean shutdown. Having an incorrect "oplogEntriesFetched" metric leads to incorrect remaining time estimates and can make resharding enter the critical section too early and hit ReshardingCriticalSectionTimeout error.

depends on

SERVER-96561 Make ReshardingOplogFetcher insert oplog entries in batch

Closed

Assignee:: Cheahuychou Mao
Reporter:: Cheahuychou Mao
Participants:: Cheahuychou Mao, Githook User
Votes:: 0 Vote for this issue
Watchers:: 8 Start watching this issue

Created:: Sep 13 2024 05:30:08 PM UTC
Updated:: Dec 13 2024 09:44:28 PM UTC
Resolved:: Nov 07 2024 04:17:25 PM UTC
Confidence Status Last Update:: 21/Oct/24 3:54 PM

Details

Description

Attachments

Issue Links

Activity

People

Dates