The donor writes the enterCriticalSectionCounter flag
-> which causes secondaries to clear their filtering metadata
-> which causes the next versioned request on the secondary to throw StaleConfig and trigger the secondary to refresh
-> which causes the secondary to send flushRoutingTableCacheUpdates to the primary
-> which blocks behind the critical section only if reads are being blocked
In 4.4 and earlier versions, if reads haven't started being blocked yet, the secondary will finish the refresh and serve reads for stale mongoses even if the migration commits.
For example:
- Donor writes enterCriticalSectionSignal at T90
- Secondary sees the flag, invalidates its filtering metadata
- Secondary gets versioned read, sendsflushRoutingTableCacheUpdates, gets back success
- Donor starts blocking writes
- Donor commits the migration, which succeeds at T100
- Client does a write from mongos1, which contacts donor and gets back StaleConfig, then retries write on recipient, which succeeds at T101
- Client does afterClusterTime: T101 read from mongos2, which is stale and contacts the donor secondary. >>> That secondary will wait until T101, then serve the read <<<
In 4.5, that happens to not be an issue since the refresh is done by calling onShardVersionMismatch which waits for the critical section as long as writes are already being blocked.
Despite that, we want to change flushRoutingTableCacheUpdates in all versions to block behind the critical section with kWrite, not kRead, as it does today.
- related to
-
SERVER-50898 safe_secondary_reads_causal_consistency.js must wait for effects of _configsvrCommitChunkMigration to be majority-committed snapshot on all CSRS members
- Closed