Loading...

XML

Word

Printable

JSON

Type: New Feature
Resolution: Fixed
Priority: Minor - P4
Fix Version/s: 4.8.0, 4.4.2, 4.2.13, 4.0.24
Affects Version/s: 3.4.13
Component/s: Networking, Replication
Labels:
- former-quick-wins
- gm-ack

Backwards Compatibility:
Minor Change
Backport Requested:

v4.4, v4.2, v4.0
Sprint:
Repl 2020-08-24, Repl 2020-09-07, Repl 2020-09-21, Repl 2020-10-05
Case:
Linked BF Score:
40
Confidence Status:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name:
None
Goal Link:
None

We have some dev/staging environments which are locally hosted in our office building. They are entirely for internal usage, so there uptime isn't critical. We have recently been experiencing power outages that cause all 3 members of the replica set to go down and then when power restores come back up at the same time. This has happened about 5 times now and each time when the replica set comes back up both the primary/secondary end up in the REMOVED status and never recover unless we manually restart one of the mongo processes.

mongo-dev1 rs.status()

{
        "state" : 10,
        "stateStr" : "REMOVED",
        "uptime" : 199841,
        "optime" : {
                "ts" : Timestamp(1529137449, 1),
                "t" : NumberLong(590)
        },
        "optimeDate" : ISODate("2018-06-16T08:24:09Z"),
        "ok" : 0,
        "errmsg" : "Our replica set config is invalid or we are not a member of it",
        "code" : 93,
        "codeName" : "InvalidReplicaSetConfig"
}

mongo-dev2 rs.status()

{
        "state" : 10,
        "stateStr" : "REMOVED",
        "uptime" : 199879,
        "optime" : {
                "ts" : Timestamp(1529137449, 1),
                "t" : NumberLong(590)
        },
        "optimeDate" : ISODate("2018-06-16T08:24:09Z"),
        "ok" : 0,
        "errmsg" : "Our replica set config is invalid or we are not a member of it",
        "code" : 93,
        "codeName" : "InvalidReplicaSetConfig"
}

mongo-dev1 show log rs

2018-06-16T09:10:25.236+0000 I REPL     [replExecDBWorker-0] New replica set config in use: { _id: "dev_cluster1", version: 140719, protocolVersion: 1, members: [ { _id: 0, host: "mongo-dev1.220office.local:27017", arbiterOnly: false, buildIndexes: true, hidden: false, priority: 2.0, tags: {}, slaveDelay: 0, votes: 1 }, { _id: 3, host: "utility-dev1.220office.local:27017", arbiterOnly: true, buildIndexes: true, hidden: false, priority: 1.0, tags: {}, slaveDelay: 0, votes: 1 }, { _id: 4, host: "mongo-dev2.2
2018-06-16T09:10:25.236+0000 I REPL     [replExecDBWorker-0] transition to REMOVED
2018-06-17T21:28:09.139+0000 I REPL     [ReplicationExecutor] Member utility-dev1.220office.local:27017 is now in state ARBITER

mongo-dev1 rs.conf()

{
        "_id" : "dev_cluster1",
        "version" : 140719,
        "protocolVersion" : NumberLong(1),
        "members" : [
                {
                        "_id" : 0,
                        "host" : "mongo-dev1.220office.local:27017",
                        "arbiterOnly" : false,
                        "buildIndexes" : true,
                        "hidden" : false,
                        "priority" : 2,
                        "tags" : {                        },
                        "slaveDelay" : NumberLong(0),
                        "votes" : 1
                },
                {
                        "_id" : 3,
                        "host" : "utility-dev1.220office.local:27017",
                        "arbiterOnly" : true,
                        "buildIndexes" : true,
                        "hidden" : false,
                        "priority" : 1,
                        "tags" : {                        },
                        "slaveDelay" : NumberLong(0),
                        "votes" : 1
                },
                {
                        "_id" : 4,
                        "host" : "mongo-dev2.220office.local:27017",
                        "arbiterOnly" : false,
                        "buildIndexes" : true,
                        "hidden" : false,
                        "priority" : 1,
                        "tags" : {                        },
                        "slaveDelay" : NumberLong(0),
                        "votes" : 1
                }
        ],
        "settings" : {
                "chainingAllowed" : true,
                "heartbeatIntervalMillis" : 2000,
                "heartbeatTimeoutSecs" : 10,
                "electionTimeoutMillis" : 10000,
                "catchUpTimeoutMillis" : 60000,
                "getLastErrorModes" : {                },
                "getLastErrorDefaults" : {
                        "w" : 1,
                        "wtimeout" : 0
                }
        }
}

As I read the documentation I can't find much information about the REMOVED status. In our setup, since mongo-dev1 has a priority of 2 and mongo-dev2 has a priority of 1 I would expect mongo-dev1 to be elected as the primary after the reboot.

Is this a bug or are we doing something wrong. If it's a bug, what is the proper procedure for this when all members of the replica set come online at the same time?

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

mongod.log
Jun 25 2018 06:20:12 PM UTC
145 kB
Owen Allen

is related to

SERVER-62699 Replica set fails to restart after shutdown of all Nodes in a Dynamic DNS/network environment

Backlog

SERVER-48178 Finding self in reconfig may be interrupted by closing connections due to rollback

Closed

SERVER-51163 Mark nodes returning InvalidReplicaSetConfig in heartbeats as down

Closed

SERVER-40159 Add retry logic for name resolution failure in isSelf

Closed

related to

SERVER-41031 After an unreachable node is added and removed from the replica set, the other replica set members continue to send heartbeat to this removed node

Open

SERVER-54121 Single replica set node removed due to isSelf cannot re-attempt to find itself

Open

SERVER-62699 Replica set fails to restart after shutdown of all Nodes in a Dynamic DNS/network environment

Backlog

SERVER-48480 Abort initial sync upon transition to REMOVED state

Closed

SERVER-40159 Add retry logic for name resolution failure in isSelf

Closed

(4 related to)

Assignee:: A. Jesse Jiryu Davis

Reporter:: Owen Allen

Participants:: A. Jesse Jiryu Davis, Githook User, Nick Brewer, Owen Allen, Spencer Brody, Steven Vannelli

Votes:: 3 Vote for this issue

Watchers:: 25 Start watching this issue

Created:: Jun 18 2018 04:44:36 PM UTC

Updated:: Jan 08 2024 03:22:52 PM UTC

Resolved:: Sep 21 2020 01:24:23 PM UTC

Confidence Status Last Update:: 14/Aug/20 1:12 PM

GA Target Date:: None

Public Preview Target Date:: None

Private Preview Target Date:: None

Experiment Target Date:: None

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates