-
Type: Bug
-
Resolution: Done
-
Priority: Major - P3
-
None
-
Affects Version/s: None
-
Component/s: None
-
Catalog and Routing
-
Fully Compatible
-
ALL
-
CAR Team 2024-03-04, CAR Team 2024-03-18
Suppose we have a shard that's attempting to commit a DDL operation. Before doing so we may refresh data from the config shard in order to verify if a previous node already did so and failed after doing the operation on the config shard.
This behavior is problematic if we rely on the gossiped Vector Clock since we could end up mistakenly failing the check above and performing the same operation twice.
This can occur in the following scenario:
- Shard S1 has three nodes.
- Config Shard CS has three nodes.
- S1's Primary commits the DDL operation on CS with majority writeConcern and performs a stepdown before it persists the new vector clock.
- S1's new primary chosen has the previous Vector Clock.
- S1's new primary refreshes its catalog metadata by contacting a stale CS node that is still observing the old Vector Clock and is at a stale majority timestamp. This can happen because we do not have a PrimaryOnly readPreference for this read.
- S1's new primary fails the check since from it's perspective we're still in the old pre-commit world.
- S1's new primary then re-commits the DDL operation.
- is related to
-
SERVER-87977 Add the explicit replay protection to the commit phase of the sharding ConvertToCappedCoordinator
- Closed
- related to
-
SERVER-85534 Checkpoint the vector clock after committing shard collection
- Closed