Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Fixed
Priority: Major - P3
Fix Version/s: 4.0.19, 4.2.7, 4.4.0-rc3, 4.7.0
Affects Version/s: None
Component/s: Replication
Labels:
None

Backwards Compatibility:
Fully Compatible
Operating System:
ALL
Backport Requested:

v4.4, v4.2, v4.0
Linked BF Score:
16
Confidence Status:
None
Work Order:
3

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name:
None
Goal Link:
None

If a node receives a heartbeat reconfig and can't find itself in the config due to a network issue, it sets TopologyCoordinator::_selfIndex to -1. It logs like:

Cannot find self in new replica set configuration; I must be removed{"error":{"code":74,"codeName":"NodeNotFound","errmsg":"No host described in new configuration with {version: 3, term: 1} for replica set server7781-configRS maps to this node"}}

If TopologyCoordinator::processReplSetRequestVotes then receives a request with the correct config term and version, it passes the check added in ~~SERVER-46387~~, and goes on to check whether _selfConfig().isArbiter(). The node crashes with an invariant in _selfConfig() because _selfIndex is -1.

The root cause is a network problem that prevents the node from finding itself in the config. We've observed mysterious DNS issues in EC2 that sometimes prevent mongod from resolving its own address in repl::isSelf(), perhaps the build failure I'm debugging is an example of that. Regardless, we must prevent any scenario that uses -1 as a member index.

Assignee:: A. Jesse Jiryu Davis
Reporter:: A. Jesse Jiryu Davis
Participants:: A. Jesse Jiryu Davis, Githook User
Votes:: 0 Vote for this issue
Watchers:: 4 Start watching this issue

Created:: Apr 17 2020 12:21:13 AM UTC
Updated:: Oct 29 2023 10:09:22 PM UTC
Resolved:: Apr 22 2020 02:40:35 PM UTC
Confidence Status Last Update:: 17/Apr/20 2:54 AM

Details

Description

Attachments

Activity

People

Dates