Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-46395

Resumable range deleter can submit duplicate range deletion tasks on step up

    • Type: Icon: Bug Bug
    • Resolution: Fixed
    • Priority: Icon: Major - P3 Major - P3
    • 4.4.0-rc0, 4.7.0
    • Affects Version/s: None
    • Component/s: Sharding
    • None
    • Fully Compatible
    • ALL
    • v4.4
    • Hide

      Apply the following patch build introduces an invariant into MetadataManager::_submitRangeForDeletion() to check for range deletion tasks on the same chunk range and run the following resmoke.py invocation.

      python buildscripts/resmoke.py --suite=sharding jstests/sharding/resumable_range_delete_step_down_then_up.js
      
      Unable to find source-code formatter for language: diff. Available languages are: actionscript, ada, applescript, bash, c, c#, c++, cpp, css, erlang, go, groovy, haskell, html, java, javascript, js, json, lua, none, nyan, objc, perl, php, python, r, rainbow, ruby, scala, sh, sql, swift, visualbasic, xml, yaml
      diff --git a/jstests/sharding/resumable_range_delete_step_down_then_up.js b/jstests/sharding/resumable_range_delete_step_down_then_up.js
      new file mode 100644
      index 0000000000..5bda43b509
      --- /dev/null
      +++ b/jstests/sharding/resumable_range_delete_step_down_then_up.js
      @@ -0,0 +1,51 @@
      +(function() {
      +"use strict";
      +
      +load("jstests/libs/fail_point_util.js");
      +
      +const st = new ShardingTest({mongos: 1, config: 1, shards: 2, rs: {nodes: 1}});
      +const dbName = "test";
      +const collName = "mycoll";
      +const collection = st.s.getDB(dbName).getCollection(collName);
      +
      +// Create a sharded collection with two chunks on shard0, split at the key {x: -1}.
      +assert.commandWorked(st.s.adminCommand({enableSharding: dbName}));
      +assert.commandWorked(st.s.adminCommand({movePrimary: dbName, to: st.shard0.shardName}));
      +assert.commandWorked(st.s.adminCommand({shardCollection: collection.getFullName(), key: {x: 1}}));
      +assert.commandWorked(st.s.adminCommand({split: collection.getFullName(), middle: {x: -1}}));
      +
      +// Insert some documents into the collection, but only into the higher of the two chunks.
      +assert.commandWorked(collection.insert(Array.from({length: 100}, (_, i) => ({_id: i, x: i}))));
      +
      +// Enable the failpoint to cause range deletion to hang indefinitely.
      +const suspendRangeDeletionFailpoint = configureFailPoint(st.shard0, "suspendRangeDeletion");
      +
      +// Note we're using _waitForDelete=false to ensure the command completes because we're intentionally
      +// causing range deletion to hang.
      +assert.commandWorked(st.s.adminCommand({
      +    moveChunk: collection.getFullName(),
      +    find: {x: 1},
      +    to: st.shard1.shardName,
      +    _waitForDelete: false,
      +}));
      +
      +jsTestLog("Waiting for the suspendRangeDeletion failpoint to be hit");
      +suspendRangeDeletionFailpoint.wait();
      +
      +// We step the shard0 primary down and then back up again. On stepping up, the shard0 primary will
      +// create a new range deletion task.
      +const shard1Primary = st.rs0.getPrimary();
      +assert.commandWorked(shard1Primary.adminCommand({
      +    replSetStepDown: ReplSetTest.kForeverSecs,
      +    force: true,
      +}));
      +
      +st.rs0.waitForState(shard1Primary, ReplSetTest.State.SECONDARY);
      +assert.commandWorked(shard1Primary.adminCommand({replSetFreeze: 0}));
      +assert.commandWorked(shard1Primary.adminCommand({replSetStepUp: 1}));
      +st.rs0.waitForState(shard1Primary, ReplSetTest.State.PRIMARY);
      +
      +suspendRangeDeletionFailpoint.off();
      +
      +st.stop();
      +})();
      diff --git a/src/mongo/db/s/metadata_manager.cpp b/src/mongo/db/s/metadata_manager.cpp
      index 4523d1761d..4ae566e7ba 100644
      --- a/src/mongo/db/s/metadata_manager.cpp
      +++ b/src/mongo/db/s/metadata_manager.cpp
      @@ -450,6 +450,11 @@ SharedSemiFuture<void> MetadataManager::_submitRangeForDeletion(
                                      delayForActiveQueriesOnSecondariesToComplete,
                                      Milliseconds(rangeDeleterBatchDelayMS.load()));
      
      +    for (auto const& [alreadyScheduledRange, _] : _rangesScheduledForDeletion) {
      +        invariant(range != alreadyScheduledRange,
      +                  "attempted to schedule duplicate range deletion task");
      +    }
      +
           _rangesScheduledForDeletion.emplace_front(range, cleanupComplete);
           // Attach a continuation so that once the range has been deleted, we will remove the deletion
           // from the _rangesScheduledForDeletion.  std::list iterators are never invalidated, which
      
      Show
      Apply the following patch build introduces an invariant into MetadataManager::_submitRangeForDeletion() to check for range deletion tasks on the same chunk range and run the following resmoke.py invocation. python buildscripts/resmoke.py --suite=sharding jstests/sharding/resumable_range_delete_step_down_then_up.js Unable to find source-code formatter for language: diff. Available languages are: actionscript, ada, applescript, bash, c, c#, c++, cpp, css, erlang, go, groovy, haskell, html, java, javascript, js, json, lua, none, nyan, objc, perl, php, python, r, rainbow, ruby, scala, sh, sql, swift, visualbasic, xml, yaml diff --git a/jstests/sharding/resumable_range_delete_step_down_then_up.js b/jstests/sharding/resumable_range_delete_step_down_then_up.js new file mode 100644 index 0000000000..5bda43b509 --- /dev/ null +++ b/jstests/sharding/resumable_range_delete_step_down_then_up.js @@ -0,0 +1,51 @@ +(function() { + "use strict" ; + +load( "jstests/libs/fail_point_util.js" ); + + const st = new ShardingTest({mongos: 1, config: 1, shards: 2, rs: {nodes: 1}}); + const dbName = "test" ; + const collName = "mycoll" ; + const collection = st.s.getDB(dbName).getCollection(collName); + + // Create a sharded collection with two chunks on shard0, split at the key {x: -1}. + assert .commandWorked(st.s.adminCommand({enableSharding: dbName})); + assert .commandWorked(st.s.adminCommand({movePrimary: dbName, to: st.shard0.shardName})); + assert .commandWorked(st.s.adminCommand({shardCollection: collection.getFullName(), key: {x: 1}})); + assert .commandWorked(st.s.adminCommand({split: collection.getFullName(), middle: {x: -1}})); + + // Insert some documents into the collection, but only into the higher of the two chunks. + assert .commandWorked(collection.insert(Array.from({length: 100}, (_, i) => ({_id: i, x: i})))); + + // Enable the failpoint to cause range deletion to hang indefinitely. + const suspendRangeDeletionFailpoint = configureFailPoint(st.shard0, "suspendRangeDeletion" ); + + // Note we 're using _waitForDelete= false to ensure the command completes because we' re intentionally + // causing range deletion to hang. + assert .commandWorked(st.s.adminCommand({ + moveChunk: collection.getFullName(), + find: {x: 1}, + to: st.shard1.shardName, + _waitForDelete: false , +})); + +jsTestLog( "Waiting for the suspendRangeDeletion failpoint to be hit" ); +suspendRangeDeletionFailpoint.wait(); + + // We step the shard0 primary down and then back up again. On stepping up, the shard0 primary will + // create a new range deletion task. + const shard1Primary = st.rs0.getPrimary(); + assert .commandWorked(shard1Primary.adminCommand({ + replSetStepDown: ReplSetTest.kForeverSecs, + force: true , +})); + +st.rs0.waitForState(shard1Primary, ReplSetTest.State.SECONDARY); + assert .commandWorked(shard1Primary.adminCommand({replSetFreeze: 0})); + assert .commandWorked(shard1Primary.adminCommand({replSetStepUp: 1})); +st.rs0.waitForState(shard1Primary, ReplSetTest.State.PRIMARY); + +suspendRangeDeletionFailpoint.off(); + +st.stop(); +})(); diff --git a/src/mongo/db/s/metadata_manager.cpp b/src/mongo/db/s/metadata_manager.cpp index 4523d1761d..4ae566e7ba 100644 --- a/src/mongo/db/s/metadata_manager.cpp +++ b/src/mongo/db/s/metadata_manager.cpp @@ -450,6 +450,11 @@ SharedSemiFuture<void> MetadataManager::_submitRangeForDeletion( delayForActiveQueriesOnSecondariesToComplete, Milliseconds(rangeDeleterBatchDelayMS.load())); + for (auto const & [alreadyScheduledRange, _] : _rangesScheduledForDeletion) { + invariant(range != alreadyScheduledRange, + "attempted to schedule duplicate range deletion task" ); + } + _rangesScheduledForDeletion.emplace_front(range, cleanupComplete); // Attach a continuation so that once the range has been deleted, we will remove the deletion // from the _rangesScheduledForDeletion. std::list iterators are never invalidated, which
    • Sharding 2020-03-09, Sharding 2020-03-23
    • 0

      If the replica set shard primary steps down before it has processed a range deletion task from a chunk moving off the shard and very shortly after steps back up, then it is possible for multiple range deletion tasks to be scheduled and eventually run. This is problematic because the first task completing will remove the document from config.rangeDeletions collection. Subsequent checks for whether the deletion of a range is scheduled are based on the contents of the config.rangeDeletions collection (rather than the in-memory state) and would mean a chunk migration for an overlapping range could be accepted by the shard. The range deleter would then be able to remove documents that are actually owned by the shard.

      Credit to esha.maharishi for finding this issue.

            Assignee:
            max.hirschhorn@mongodb.com Max Hirschhorn
            Reporter:
            max.hirschhorn@mongodb.com Max Hirschhorn
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: