Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-68541

Concurrent removeShard and movePrimary may delete unsharded collections

    • Type: Icon: Bug Bug
    • Resolution: Fixed
    • Priority: Icon: Major - P3 Major - P3
    • 6.1.1, 6.0.3, 6.2.0-rc0
    • Affects Version/s: None
    • Component/s: None
    • Fully Compatible
    • ALL
    • v6.1, v6.0, v5.0, v4.4, v4.2
    • Hide

      repro-undesired-unsharded-collections-remove.patch
      Apply the provided patch on top of commit r6.1.0-alpha-1938-gfe099ee11c9 and run jstests/sharding/remove_shard_and_move_primary.js in the sharding suite.

      Show
      repro-undesired-unsharded-collections-remove.patch Apply the provided patch on top of commit r6.1.0-alpha-1938-gfe099ee11c9 and run jstests/sharding/remove_shard_and_move_primary.js in the sharding suite.
    • Sharding EMEA 2022-08-22, Sharding EMEA 2022-09-05

      Concurrent removeShard and movePrimary may end up with an undesired delete of unsharded collections.

      Bug description
      Imagine the following scenario:

      • There are 2 shards: 'shard0', 'shard1'
      • Database 'myDB' primary shard is 'shard0'
      • Collection 'myDB.collA' is unsharded, so it's located in 'shard0'

      At some point, someone decides to call concurrently these commands:

      • { removeShard:'shard1' }
      • { movePrimary:'myDB', to: 'shard1'}

        .

      Then, if the sequence of the internal executions are the written below, the cluster will end up with an undesired deletion of all the unsharded collections of 'myDB'.

      1. removeShard command is called to the config server
      2. The config server, following the removeShard thread, checks if the unsharded databases count on the shard is zero. As it's true, the process continues.
      3. After that point, the movePrimary is performed, which means that all the unsharded collections are moved to 'shard1'.
      4. The removeShard commit phase starts and 'shard1' is removed from the topology of the cluster.

      Small note to understand better the 2nd bullet: the removeShard command returns a non completed status if the shard still have unsharded databases and notifies the user that those should be moved explicitly using movePrimary. A better explanation can be found here.

            Assignee:
            antonio.fuschetto@mongodb.com Antonio Fuschetto
            Reporter:
            silvia.surroca@mongodb.com Silvia Surroca
            Votes:
            0 Vote for this issue
            Watchers:
            9 Start watching this issue

              Created:
              Updated:
              Resolved: