Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Fixed
Priority: Major - P3
Fix Version/s: 4.4.0-rc4, 4.7.0
Affects Version/s: 4.5.1, 4.4.0-rc1
Component/s: Replication
Labels:
- safe-reconfig-related

Backwards Compatibility:
Fully Compatible
Operating System:
ALL
Backport Requested:

v4.4
Steps To Reproduce:
Hide

Applying this diff (force_reconfig_drain_mode_repro.diff) on this commit and running the following commands should reproduce the bug:

ninja -j400 build/ninja/mongo/db/repl/db_repl_coordinator_test build/ninja/mongo/db/repl/db_repl_coordinator_test --suite ReplCoordTest --filter NodeReturnsNotMasterWhenRunningForceReconfigWhileInDrainMode
Show
Applying this diff ( force_reconfig_drain_mode_repro.diff ) on this commit and running the following commands should reproduce the bug: ninja -j400 build/ninja/mongo/db/repl/db_repl_coordinator_test build/ninja/mongo/db/repl/db_repl_coordinator_test --suite ReplCoordTest --filter NodeReturnsNotMasterWhenRunningForceReconfigWhileInDrainMode
Sprint:
Repl 2020-05-04
Linked BF Score:
42

After a node has been elected primary and drained the ops from its buffer, it will check if it needs to run a reconfig to increment its config term. It does this under the replication coordinator mutex, but then releases the lock before running the reconfig. If a force reconfig is running concurrently it may install a new config with term -1 after we do this check and release our lock but before we run the reconfig. If this happens, we will then try to run a reconfig where we set the config version to the version installed by the force reconfig, and the config term to the node's current term. If the force reconfig installed version 'version' and the node's current term is 'term', then we will run a reconfig to (version, term), while our current config is (version, -1). Since we ignore terms for config comparison if either term is -1, this will not pass the validation check that the new config has a newer version and term than the current config. We will return this error and then fassert.

To address this, we may want to consider preventing force reconfigs from running concurrently with a node while in drain mode. For non force reconfigs, we should already prevent this since we check canAcceptNonLocalWrites, but we bypass these checks for force reconfigs, since they can run on a secondary.

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

force_reconfig_drain_mode_repro.diff
Apr 17 2020 06:19:49 PM UTC
3 kB
William Schultz

related to

SERVER-47142 Check primary before writing replset config and no-op

Closed

Assignee:: William Schultz (Inactive)

Reporter:: William Schultz (Inactive)

Participants:: Githook User, Siyuan Zhou, William Schultz

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Created:: Apr 17 2020 06:22:05 PM UTC

Updated:: Oct 29 2023 10:09:19 PM UTC

Resolved:: Apr 22 2020 03:07:58 PM UTC

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates