Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-88603

Reduce reshardingTxnClonerProgressBatchSize in Stepdown Suites

    • Type: Icon: Bug Bug
    • Resolution: Won't Fix
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • None
    • Cluster Scalability
    • ALL
    • Cluster Scalability 2024-4-1, Cluster Scalability 2024-4-15
    • 8

      By default, the resharding transaction cloner only writes down its progress every 1000 entries. In stepdown suites, a failover is triggered every 8 seconds. In very slow variants (e.g. tsan debug), the cloner might be unable to process enough records to reach a checkpoint where progress is persisted before the next failover occurs, leaving it unable to make any progress (see BF-32013 for an example of this in practice).

      We should reduce the batch size to 1 in stepdown suites to guarantee that the cloner is able to make progress, even if the system is very slow.

            Assignee:
            brett.nawrocki@mongodb.com Brett Nawrocki
            Reporter:
            brett.nawrocki@mongodb.com Brett Nawrocki
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: