Uploaded image for project: 'WiredTiger'
  1. WiredTiger
  2. WT-8003

Fix frequent duplicate keys returned by random cursor in resharding test

    • Type: Icon: Bug Bug
    • Resolution: Done
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • 8
    • Storage Engines - 2022-10-31, 2023-05-30 - 7.0 Readiness, StorEng - 2023-06-13, 2023-06-27 Lord of the Sprints, 2023-07-11 WiredTractor, 2023-07-25 Absolute unit, StorEng - 2023-08-08, ASeasonTooMany-2023-08-22, BermudaTriangle- 2023-09-05

      Resharding uses $sample internally.  I.e., it is using a WT random cursor.  In a resharding performance test, occasionally the test fails when $sample repeatedly fails to find 100 unique documents.

      In this ticket we should reproduce the failure, adding instrumentation to WT as needed, and once we understand the issue find a way to make random cursors behave better in the problem case.

      The problem test is the ReshardCollection.yml genny workload. It inserts 100,000 10KB documents split evenly across two shards.  It then reshards the cluster while 100 threads perform reads and writes (find and update commands).  Resharding tries to get ~200 samples from each shard via $sample.  Occasionally, the sample includes duplicate keys.   We see an error when 100 consecutive attempts to get ~200 unique keys all fail.  

      $sample is allowed to return duplicate keys. But given the number of keys and size of the sample, having this happen repeatedly is surprising and undesirable.

        1. image-2023-07-06-16-57-09-560.png
          213 kB
          Jie Chen
        2. image-2023-07-20-16-59-31-389.png
          131 kB
          Jie Chen
        3. image-2023-07-20-16-59-36-750.png
          131 kB
          Jie Chen
        4. image-2023-08-04-13-27-16-408.png
          185 kB
          Jie Chen
        5. image-2023-08-07-09-50-53-626.png
          59 kB
          Jie Chen
        6. image-2023-08-09-13-51-28-356.png
          164 kB
          Jie Chen
        7. reproducer.txt
          5 kB
          Etienne Petrel
        8. results_1.txt
          15 kB
          Etienne Petrel
        9. results_last_key.txt
          465 kB
          Etienne Petrel
        10. results_random_sample_size.rtf
          21 kB
          Etienne Petrel
        11. results_reset_reader.rtf
          20 kB
          Etienne Petrel
        12. tree_struct.txt
          9.91 MB
          Etienne Petrel

            Assignee:
            jie.chen@mongodb.com Jie Chen
            Reporter:
            keith.smith@mongodb.com Keith Smith
            Etienne Petrel
            Votes:
            0 Vote for this issue
            Watchers:
            22 Start watching this issue

              Created:
              Updated:
              Resolved: