Our framework for the critical sections allows to enter in catch-up phase (only write ops are blocked) and in a commit phase (both read and writes ops are blocked).
The getCriticalSectionSignal function accepts an argument, kWrite or kRead, to get the signal when a thread is entered in the catch-up or commit phases (kWrite) or only in the commit phase (kRead).
When the database version is needed to be refreshed, the current logic gets the critical section signal only when another thread is entered in the commit phase (which makes sense), BUT there is a problem. Suppose the following sequence of events:
- Thread A enters in the catch-up phase of the movePrimary
- Thread B refreshes the version of the same database, get the signal with kRead and the result is boost::none, so it continues
- Thread B enters in the commit phase and the ongoing refresh is not cancelled! In this case we have a race: movePrimary and version refresh run in parallel.
There are two solutions for this problem:
- Get the signal with kRead but we need to cancel ongoing refreshed when the commit phase is entered.
- Get the signal with kWrite and cancel ongoing refreshed when the catch-up phase is entered.
What about secondaries? Currently, secondary nodes are notified only when the primary enters in the catch-up phase. Consequently, we can only follow the solution 2 above, which is the target of this ticket.