Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Fixed
Priority: Major - P3
Fix Version/s: 6.1.1, 6.0.3, 6.2.0-rc0
Affects Version/s: None
Component/s: None
Labels:
- data-loss

Backwards Compatibility:
Fully Compatible
Operating System:
ALL
Backport Requested:

v6.1, v6.0, v5.0, v4.4, v4.2
Steps To Reproduce:

Hide

repro-undesired-unsharded-collections-remove.patch
Apply the provided patch on top of commit r6.1.0-alpha-1938-gfe099ee11c9 and run jstests/sharding/remove_shard_and_move_primary.js in the sharding suite.

Show
repro-undesired-unsharded-collections-remove.patch Apply the provided patch on top of commit r6.1.0-alpha-1938-gfe099ee11c9 and run jstests/sharding/remove_shard_and_move_primary.js in the sharding suite.
Sprint:
Sharding EMEA 2022-08-22, Sharding EMEA 2022-09-05

Concurrent removeShard and movePrimary may end up with an undesired delete of unsharded collections.

Bug description
Imagine the following scenario:

There are 2 shards: 'shard0', 'shard1'
Database 'myDB' primary shard is 'shard0'
Collection 'myDB.collA' is unsharded, so it's located in 'shard0'

At some point, someone decides to call concurrently these commands:

{ removeShard:'shard1' }
{ movePrimary:'myDB', to: 'shard1'}
.

Then, if the sequence of the internal executions are the written below, the cluster will end up with an undesired deletion of all the unsharded collections of 'myDB'.

1. removeShard command is called to the config server
2. The config server, following the removeShard thread, checks if the unsharded databases count on the shard is zero. As it's true, the process continues.
3. After that point, the movePrimary is performed, which means that all the unsharded collections are moved to 'shard1'.
4. The removeShard commit phase starts and 'shard1' is removed from the topology of the cluster.

Small note to understand better the 2nd bullet: the removeShard command returns a non completed status if the shard still have unsharded databases and notifies the user that those should be moved explicitly using movePrimary. A better explanation can be found here.

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

repro-undesired-unsharded-collections-remove.patch
5 kB
Aug 03 2022 04:49:07 PM UTC

is related to

SERVER-69890 Concurrent movePrimary and removeShard can move database to a no-longer existent shard

Closed

Assignee:: Antonio Fuschetto

Reporter:: Silvia Surroca

Participants:: Antonio Fuschetto, Cris Insignares Cuello, Githook User, Kaloian Manassiev, Silvia Surroca

Votes:: 0 Vote for this issue

Watchers:: 9 Start watching this issue

Created:: Aug 03 2022 05:20:56 PM UTC

Updated:: Oct 29 2023 09:34:58 PM UTC

Resolved:: Aug 31 2022 04:21:07 PM UTC

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates