Resharding coordinator force a refresh of the collection routing info cache and then extracts the database primary shard from it.
While this ensures that the collection metadata retrieved is causally consistent with the latest DDL operation executed on the collection itself, it does not guarantee that the database metadata is causally consistent with the latest DDL operations executed on the database.
In fact forcing a refresh of the collection routing info does not also force a refresh of the database info cache. This means that the database primary shard exposed through the collection routing info cache could be stale.
If resharding coordinator uses a stale database primary shard information, it could happen that it will not include the current database primary shard in the set of recipient shard of the resharding operation. The result is that the resharding operation will miss updating the state of the target collection on the database primary shard, leaving the local catalog on that shard in an inconsistent state. In particular, if the db primary shard doesn't own any chunk for the resharded collection, it could happen that it won't have the collection on its local catalog after the resharding operation has finished.
This is particularly problematic because DDL operations rely on the assumption that the database primary shard always has correct and up-to-date information about collections in the database the node is primary for.
- is related to
-
SERVER-86671 CollectionRoutingInfo could contain stale database information even after refresh
- Closed
- related to
-
SERVER-88417 processReshardingFieldsForRecipientCollection can use stale db info and incorrectly creates a recipient
- Closed