In SERVER-50486, we added a flag on the opCtx of transaction operations to ensure that these operations would be interrupted on step down. We then check to make sure we are still the primary. The commandCanRunHere function will return true if we can accept non-local writes.
In the stepDown code path, we first acquire the RSTL, which is where we run the killOps thread to kill the opCtx of any commands that have the flag set. Only then do we update if we can accept non-local writes or not. As a result, it seems possible for the following to happen:
- In the user thread t1, we add a user command to the _clients vector in ServiceContext. However, we haven't yet hit ExecCommandDatabase::_initiateCommand() and set the flag
- In the stepDown thread t2, we attempt to acquire RSTL and loop through all commands. Since the flag is not yet set for the command in t1, it is not killed
- In t1, we now set the flag and check if we can still service non-local writes. Since we still can, the command proceeds
- In t2, we acquire RSTL and set that we can no longer service non-local writes.
- is related to
-
SERVER-50486 invokeWithSessionCheckedOut being called on prepared transactions on secondaries
- Closed
-
SERVER-66351 Audit uses of OperationContext::setAlwaysInterruptAtStepDownOrUp
- Open