-
Type: Bug
-
Resolution: Won't Fix
-
Priority: Major - P3
-
None
-
Affects Version/s: None
-
Component/s: None
-
None
-
Cluster Scalability
-
ALL
-
Cluster Scalability 2024-4-1, Cluster Scalability 2024-4-15
-
8
By default, the resharding transaction cloner only writes down its progress every 1000 entries. In stepdown suites, a failover is triggered every 8 seconds. In very slow variants (e.g. tsan debug), the cloner might be unable to process enough records to reach a checkpoint where progress is persisted before the next failover occurs, leaving it unable to make any progress (see BF-32013 for an example of this in practice).
We should reduce the batch size to 1 in stepdown suites to guarantee that the cloner is able to make progress, even if the system is very slow.