Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-76873

Range deletions should never wait on metadata objects that have no usages

    • Type: Icon: Bug Bug
    • Resolution: Fixed
    • Priority: Icon: Major - P3 Major - P3
    • 7.1.0-rc0, 7.0.0-rc1
    • Affects Version/s: None
    • Component/s: Sharding
    • None
    • Fully Compatible
    • ALL
    • v7.0, v6.3
    • Sharding EMEA 2023-05-15
    • 14

      With the flow outlineed below and in the attached repro, it is possible for a range deletion to be scheduled to wait on a stale metadata object that no queries are referencing. This is because for the most recent metadata object, we do not check the usage counter when returning the completion future. Since this function is only used by range deletions and range deletions should never wait for metadata that has no active queries, we should change this to check the usage counter always.

      The scenario to hit the issue:

      A chunk is being migrated back to a shard that previously owned this chunk. After the migration begins and range deletion task is persisted on the recipient, the primary of the recipient shard steps down. The migration is aborted because of the stepdown, and the donor marks the range deletion task as non-pending on the new primary of the recipient.

      However, the new primary has not yet run onCollectionPlacementVersionMismatch and so the only metadata in the metadata manager is stale information from when the chunk was previously owned by the shard. This is not a huge problem because the CSR has metadata unknown, and so this will not cause issues in queries. But for the range deleter, this means that the only metadata in the metadata manager overlaps with the range but isn't actually being used by any queries. Because it is the only metadata, we still have the range deleter wait on this metadata being destroyed even though there are no queries waiting on the metadata.

      The test ends right after this migration fails, and so there are no later queries to cause onCollectionPlacementVersionMismatch. If onCollectionPlacementVersionMismatch was called, the new metadata would be recovered and set as the filtering information. This would allow the previous unused metadata to be cleaned up, thus releasing the range deletion. But in this test, there happens to be nothing else happening after this migration and so the metadata exists forever and the range deletion cannot proceed.

            Assignee:
            allison.easton@mongodb.com Allison Easton
            Reporter:
            allison.easton@mongodb.com Allison Easton
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: