-
Type: Bug
-
Resolution: Fixed
-
Priority: Major - P3
-
Affects Version/s: None
-
Component/s: Replication
-
None
-
Fully Compatible
-
ALL
-
v4.4
-
Repl 2020-04-06, Repl 2020-04-20
-
42
There are currently two problems.
1) We do not check if we are still primary before writing down a new config document locally. Consider the following scenario:
- Node1 receives a reconfig command
- Node1 begins stepping down because it hears of a new term
- Node1 starts killing both writes (and some system ops) that hold the global lock in X, IX, or S mode and reads that encounter prepare conflicts. The replSetReconfig command does not fall into either category.
- Node1 finishes killing ops and steps down, transitioning to secondary
- Node1 writes down the new config document, which takes the DB lock in X mode but will not be killed since we already finished stepping down
Node1's config will continue to get propagated via heartbeats even though it already stepped down.
2) The replSetReconfig command does a no-op write, but does not check that the node is still primary before doing so (Similar example, readConcern: linearizable)
We end up calling onInternalOpMessage, which will pass in an empty namespace. Because of this, we don't actually do the primary check in _logOpsInner. This would mean that we can allow the reconfig no-op write to occur on a secondary.
Since these two things should happen together to avoid any inconsistent states, we should consider refactoring the code so we can do the primary check once.
- depends on
-
SERVER-47205 Stopping dropping snapshots after safe reconfig that does not change writeConcernMajorityJournalDefault
- Closed
- is duplicated by
-
SERVER-46516 Majority write concern is blocked by dropping snapshot on reconfig
- Closed
- is related to
-
SERVER-47206 Majority commit point is not set backward after force reconfig or reconfig that changes writeConcernMajorityJournalDefault
- Backlog
-
SERVER-46516 Majority write concern is blocked by dropping snapshot on reconfig
- Closed
-
SERVER-47636 Force reconfig running concurrently with step up can cause reconfig in drain mode to fail
- Closed
-
SERVER-47205 Stopping dropping snapshots after safe reconfig that does not change writeConcernMajorityJournalDefault
- Closed
- related to
-
SERVER-47184 replSetReconfig command should check if the node is primary before no-op write
- Closed
-
SERVER-47369 doReplSetReconfig should fail during primary drain mode
- Closed
-
SERVER-47973 Address TODOs in SERVER-47142
- Closed