-
Type: Bug
-
Resolution: Works as Designed
-
Priority: Major - P3
-
None
-
Affects Version/s: None
-
Component/s: Replication
-
Replication
-
ALL
-
Repl 2018-05-07
-
6
ReplicationCoordinatorImpl::stepDown() calls ReplicationCoordinatorExternalStateImpl::killAllUserOperations() prior to calling TopologyCoordinator::prepareForStepDownAttempt().
auto globalLock = stdx::make_unique<Lock::GlobalLock>( opCtx, MODE_X, stepDownUntil, Lock::GlobalLock::EnqueueOnly()); // We've requested the global exclusive lock which will stop new operations from coming in, // but existing operations could take a long time to finish, so kill all user operations // to help us get the global lock faster. _externalState->killAllUserOperations(opCtx); ... status = _topCoord->prepareForStepDownAttempt();
The implications of the current behavior w.r.t. retryable writes are that server selection may choose to retry the write operation against the primary in the midst of stepping down. Since the global X lock is held for the duration of the primary's stepdown attempt, the retry attempt will be blocked on the server until ReplicationCoordinatorExternalStateImpl::closeConnections() (and thus ServiceEntryPoint::endAllSessions()) has been called. A driver would then see a network error but wouldn't retry the operation for yet another time because it has exhausted its one retry attempt quota.
For comparison: The reconnect() function in jstests/replsets/rslib.js works around this issue by retrying until it succeeds in running the "collStats" command because unlike the "isMaster" command, the "collStats" command requires acquiring the global lock and therefore must wait until the stepdown has finished.
- is related to
-
SERVER-74409 StreamableReplicaSetMonitor::getHostsOrRefresh Can Return Out of Date Information
- Closed
- related to
-
SERVER-57167 Prevent throwing on session creation due to stepdown before stepdown completes
- Closed
-
SERVER-34666 Reduce the number of retries needed for running the retryable_writes_jscore_stepdown_passthrough.yml test suite
- Backlog