-
Type: Task
-
Resolution: Duplicate
-
Priority: Major - P3
-
None
-
Affects Version/s: None
-
Component/s: None
-
None
-
Sharding EMEA
-
Sharding EMEA 2023-09-04, Sharding EMEA 2023-09-18
The cloning phase of the movePrimary command conflicts with some operations, e.g. createIndexes and dropIndexes, which must fail or be serialized. For this purpose, when the cloning phase of the movePrimary command runs, it sets the in-memory MovePrimaryInProgress flag, which is checked by potentially conflicting operations. Conversely, when the cloning phases is completed, the flag is unset.
The status of this flag is not persisted, exposing the cluster to the following potential:
- Node_1 is the primary and runs a movePrimary operation
- Node_1 sets the MovePrimaryInProgress flag
- Node_1 steps down while the flag is still set
- Node_2 is elected as a primary and recovers the operation (so, sets the flag)
- Node_2 completes the operation and unsets the flag (locally)
- Sooner or later, Node_1 steps up again ==> the MovePrimaryInProgress flag is still set
The strategic solution is to reimplement the createIndexes and dropIndexes commands leveraging the DDL coordinator. In this way, these would be serialized automatically with the MovePrimary operations and it would no longer be necessary to use the MovePrimaryInProgress flag.
However, a short-term (tactical) solution might be to enhance these commands using the DDL locking. This would avoid using the DDL coordinator (expensive implementation), but still allow these operations to be serialized with the movePrimary, making the flag MovePrimaryInProgress redundant.
- duplicates
-
SERVER-75675 Ensure indexes are created in all shards
- Open
- related to
-
SERVER-90609 Current use of DatabaseShardingState::isMovePrimaryInProgress in createIndex is not sufficient to prevent running together with movePrimary
- Backlog