Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-83530

Handle QueryPlanKilled on shard_server_catalog_cache_loader_test unit test

    • Type: Icon: Bug Bug
    • Resolution: Fixed
    • Priority: Icon: Major - P3 Major - P3
    • 7.3.0-rc0
    • Affects Version/s: None
    • Component/s: None
    • None
    • Catalog and Routing
    • Fully Compatible
    • CAR Team 2023-12-25
    • 14
    • 1

      ShardServerCatalogCacheLoader::getChunkSince can throw StaleConfig under some interleavings between reading the cache and the background thread that persists the materialized cache. In practice, the CatalogCache handles this by retrying, so it doesn't cause harm.
      However, this race can cause failures on the shard_server_catalog_cache_loader_test unit test (e.g here). We can address this by making the test expect and retry this failure. Alternatively, we could make ShardServerCatalogCacheLoader retry itself.

      The interleaving that can cause this is:
      1. SSCCL discovers the new epoch.
      2. Next, it schedules an asynchronous task to update the persisted metadata.
      3. Next, it calls `_getLoaderMetadata`, which calls `getIncompletePersistedMetadataSinceVersion`, which calls `getPersistedMetadataSinceVersion`, which finally calls `readShardChunks`. readShardChunks reads from the config.cache.xxxx collection.
      4. Concurrently with the read (3), the task scheduled at (2) proceeds to drop the config.cache.xxxx collection (because the epoch has changed).
      5. The read started at (3) yields and on restore it discovers that the collection no longer exists, therefore it fails with QueryPlanKilled.

            Assignee:
            david.dominguez@mongodb.com David Dominguez Sal (Inactive)
            Reporter:
            jordi.serra-torrens@mongodb.com Jordi Serra Torrens
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: