-
Type: Bug
-
Resolution: Fixed
-
Priority: Major - P3
-
Affects Version/s: None
-
Component/s: Sharding
-
Sharding EMEA
-
Fully Compatible
-
Sharding EMEA 2023-04-03, Sharding EMEA 2023-04-17, Sharding EMEA 2023-05-01, Sharding EMEA 2023-05-15
-
135
-
3.33
Currently several DDL coordinators like rename, collmod and drop collection use configsvrSetAllowMigrations command to stop migrations while the coordinator runs because eventually there will be a metadata change and a migration to a shard that previously did not have metadata might not find out of the change.
However, the command does not have replay protection, which could cause the following scenario:
- A DDL coordinator sends a configsvrSetAllowMigration command that gets held in a router due to slowness in the networks
- There is a stepdown and the new primary executes the DDL fully, unlocking the migrations at the end of the coordinator
- The command delayed in 1 comes in and blocks the migrations for the collection
We can prevent this by adding replay protection (like configsvrRemoveChunk) to configsvrSetAllowMigrations.
- causes
-
SERVER-76836 setAllowMigrations is executing remote calls with a session checked out
- Closed
-
SERVER-77304 stopMigrations is not idempotent anymore
- Closed
- related to
-
SERVER-78021 Retrying setAllowMigrations command may end up in a deadlock
- Closed
-
SERVER-79026 Failing to cancel the JournalFlusher thread might lead to 3-way deadlock
- Closed