-
Type: Bug
-
Resolution: Done
-
Priority: Major - P3
-
None
-
Affects Version/s: None
-
Component/s: None
-
(copied to CRM)
-
8
-
Storage Engines - 2022-10-31, 2023-05-30 - 7.0 Readiness, StorEng - 2023-06-13, 2023-06-27 Lord of the Sprints, 2023-07-11 WiredTractor, 2023-07-25 Absolute unit, StorEng - 2023-08-08, ASeasonTooMany-2023-08-22, BermudaTriangle- 2023-09-05
Resharding uses $sample internally. I.e., it is using a WT random cursor. In a resharding performance test, occasionally the test fails when $sample repeatedly fails to find 100 unique documents.
In this ticket we should reproduce the failure, adding instrumentation to WT as needed, and once we understand the issue find a way to make random cursors behave better in the problem case.
The problem test is the ReshardCollection.yml genny workload. It inserts 100,000 10KB documents split evenly across two shards. It then reshards the cluster while 100 threads perform reads and writes (find and update commands). Resharding tries to get ~200 samples from each shard via $sample. Occasionally, the sample includes duplicate keys. We see an error when 100 consecutive attempts to get ~200 unique keys all fail.
$sample is allowed to return duplicate keys. But given the number of keys and size of the sample, having this happen repeatedly is surprising and undesirable.
- is duplicated by
-
SERVER-29446 $sample stage could not find a non-duplicate document while using a random cursor
- Closed
- is related to
-
WT-11533 Investigate python reproducer showing weakness in random cursor with invisible records
- Open
-
WT-11547 Investigate mongosync and mongosMerge random cursor frequent duplicate keys failure
- Open
-
SERVER-29446 $sample stage could not find a non-duplicate document while using a random cursor
- Closed
-
WT-11532 Fix session reset RNG by using cursor RNG
- Closed
-
SERVER-78841 Make the number of samples per chunk in the SamplingBasedInitialSplitPolicy configurable
- Closed
-
WT-11534 Document WT on random cursor functionality
- Closed
- related to
-
WT-11385 Investigate how a page with a few entries can be created despite of the existence of pages with lots of entries
- Open