-
Type: Bug
-
Resolution: Done
-
Priority: Major - P3
-
None
-
Affects Version/s: None
-
Component/s: None
-
None
-
Server Triage
-
ALL
In a 3 node replica set (no sharding):
at present
p1 - secondary
p2 - primary
dr - secondary
When I connect using mongsh on each of those nodes -
On p1 : mongosh <--all-envs> --port 7253
takes close 8-10 seconds to prompt mongoshell
rs.status()
members: [ { _id: 0, name: 'p1:7253', health: 1, state: 2, stateStr: 'SECONDARY', uptime: 367399, optime: \{ ts: Timestamp({ t: 1693206907, i: 1 }), t: Long("47") }, optimeDate: ISODate("2023-08-28T07:15:07.000Z"), lastAppliedWallTime: ISODate("2023-08-28T07:15:07.307Z"), lastDurableWallTime: ISODate("2023-08-28T07:15:07.307Z"), syncSourceHost: '', syncSourceId: -1, infoMessage: 'Could not find member to sync from', configVersion: 19, configTerm: 47, self: true, lastHeartbeatMessage: '' }, { _id: 1, name: 'p2:7253', health: 0, state: 8, stateStr: '(not reachable/healthy)', uptime: 0, optime: \{ ts: Timestamp({ t: 0, i: 0 }), t: Long("-1") }, optimeDurable: \{ ts: Timestamp({ t: 0, i: 0 }), t: Long("-1") }, optimeDate: ISODate("1970-01-01T00:00:00.000Z"), optimeDurableDate: ISODate("1970-01-01T00:00:00.000Z"), lastAppliedWallTime: ISODate("2023-08-28T07:15:07.307Z"), lastDurableWallTime: ISODate("2023-08-28T07:15:07.307Z"), lastHeartbeat: ISODate("2023-08-28T07:15:29.974Z"), lastHeartbeatRecv: ISODate("2023-08-31T08:19:31.951Z"), pingMs: Long("0"), lastHeartbeatMessage: "Couldn't get a connection within the time limit", syncSourceHost: '', syncSourceId: -1, infoMessage: '', configVersion: 19, configTerm: 47 }, { _id: 2, name: 'dr:7253', health: 0, state: 8, stateStr: '(not reachable/healthy)', uptime: 0, optime: \{ ts: Timestamp({ t: 0, i: 0 }), t: Long("-1") }, optimeDurable: \{ ts: Timestamp({ t: 0, i: 0 }), t: Long("-1") }, optimeDate: ISODate("1970-01-01T00:00:00.000Z"), optimeDurableDate: ISODate("1970-01-01T00:00:00.000Z"), lastAppliedWallTime: ISODate("2023-08-28T07:15:07.307Z"), lastDurableWallTime: ISODate("2023-08-28T07:15:07.307Z"), lastHeartbeat: ISODate("2023-08-31T03:50:33.417Z"), lastHeartbeatRecv: ISODate("2023-08-31T08:19:34.226Z"), pingMs: Long("80"), lastHeartbeatMessage: "Couldn't get a connection within the time limit", syncSourceHost: '', syncSourceId: -1, infoMessage: '', configVersion: 19, configTerm: 47 },
mongo7253 [direct: secondary] test> rs.printSecondaryReplicationInfo()
source: p1lc1mon029.ger.corp.intel.com:7253
—
source: p2:17253
—
source: dr:7253
mongo7253 [direct: secondary] test>
mongo7253 [direct: secondary] test> rs.printReplicationInfo()
actual oplog size
'2048 MB'
—
configured oplog size
'2048 MB'
—
log length start to end
'45966145 secs (12768.37 hrs)'
—
oplog first event time
'Mon Mar 14 2022 06:52:42 GMT+0000 (Coordinated Universal Time)'
—
oplog last event time
'Mon Aug 28 2023 07:15:07 GMT+0000 (Coordinated Universal Time)'
—
now
'Thu Aug 31 2023 08:22:47 GMT+0000 (Coordinated Universal Time)'
mongo7253 [direct: secondary] test>
On p2 : mongosh <--all-envs> --port 7253
immediately gives mongo shell command
mongo7253 [direct: primary] test> show dbs
admin 336.00 KiB
config 1.29 MiB
XYZ 9.05 MiB
local 132.86 MiB
mongo7253 [direct: primary] test>
rs.status()
members: [ { _id: 0, name: 'p1:7253', health: 1, state: 2, stateStr: 'SECONDARY', uptime: 263345, optime: \{ ts: Timestamp({ t: 1693206907, i: 1 }), t: Long("47") }, optimeDurable: \{ ts: Timestamp({ t: 1693206907, i: 1 }), t: Long("47") }, optimeDate: ISODate("2023-08-28T07:15:07.000Z"), optimeDurableDate: ISODate("2023-08-28T07:15:07.000Z"), lastAppliedWallTime: ISODate("2023-08-28T07:15:07.307Z"), lastDurableWallTime: ISODate("2023-08-28T07:15:07.307Z"), lastHeartbeat: ISODate("2023-08-31T08:24:36.250Z"), lastHeartbeatRecv: ISODate("2023-08-31T08:24:36.228Z"), pingMs: Long("1042"), lastHeartbeatMessage: '', syncSourceHost: '', syncSourceId: -1, infoMessage: '', configVersion: 19, configTerm: 47 }, { _id: 1, name: 'p2:7253', health: 1, state: 1, stateStr: 'PRIMARY', uptime: 263355, optime: \{ ts: Timestamp({ t: 1693470270, i: 1 }), t: Long("50") }, optimeDate: ISODate("2023-08-31T08:24:30.000Z"), lastAppliedWallTime: ISODate("2023-08-31T08:24:30.694Z"), lastDurableWallTime: ISODate("2023-08-31T08:24:30.694Z"), syncSourceHost: '', syncSourceId: -1, infoMessage: '', electionTime: Timestamp(\{ t: 1693206944, i: 1 }), electionDate: ISODate("2023-08-28T07:15:44.000Z"), configVersion: 19, configTerm: 50, self: true, lastHeartbeatMessage: '' }, { _id: 2, name: 'dr:7253', health: 1, state: 2, stateStr: 'SECONDARY', uptime: 18194, optime: \{ ts: Timestamp({ t: 1693470270, i: 1 }), t: Long("50") }, optimeDurable: \{ ts: Timestamp({ t: 1693470270, i: 1 }), t: Long("50") }, optimeDate: ISODate("2023-08-31T08:24:30.000Z"), optimeDurableDate: ISODate("2023-08-31T08:24:30.000Z"), lastAppliedWallTime: ISODate("2023-08-31T08:24:30.694Z"), lastDurableWallTime: ISODate("2023-08-31T08:24:30.694Z"), lastHeartbeat: ISODate("2023-08-31T08:24:34.685Z"), lastHeartbeatRecv: ISODate("2023-08-31T08:24:35.853Z"), pingMs: Long("82"), lastHeartbeatMessage: '', syncSourceHost: 'p2:7253', syncSourceId: 1, infoMessage: '', configVersion: 19, configTerm: 50 },
mongo7253 [direct: primary] test> rs.printSecondaryReplicationInfo()
source: p1:7253
—
source: dr:7253
mongo7253 [direct: primary] test>
mongo7253 [direct: primary] test> rs.printReplicationInfo()
actual oplog size
'2048 MB'
—
configured oplog size
'2048 MB'
—
log length start to end
'46229587 secs (12841.55 hrs)'
—
oplog first event time
'Mon Mar 14 2022 06:52:53 GMT+0000 (Coordinated Universal Time)'
—
oplog last event time
'Thu Aug 31 2023 08:26:00 GMT+0000 (Coordinated Universal Time)'
—
now
'Thu Aug 31 2023 08:26:03 GMT+0000 (Coordinated Universal Time)'
mongo7253 [direct: primary] test>
On dr: mongosh <--all-envs> --port 7253
members: [ { _id: 0, name: 'p1:7253', health: 1, state: 2, stateStr: 'SECONDARY', uptime: 18319, optime: \{ ts: Timestamp({ t: 1693206907, i: 1 }), t: Long("47") }, optimeDurable: \{ ts: Timestamp({ t: 1693206907, i: 1 }), t: Long("47") }, optimeDate: ISODate("2023-08-28T07:15:07.000Z"), optimeDurableDate: ISODate("2023-08-28T07:15:07.000Z"), lastAppliedWallTime: ISODate("2023-08-28T07:15:07.307Z"), lastDurableWallTime: ISODate("2023-08-28T07:15:07.307Z"), lastHeartbeat: ISODate("2023-08-31T08:26:33.995Z"), lastHeartbeatRecv: ISODate("2023-08-31T08:26:37.495Z"), pingMs: Long("959"), lastHeartbeatMessage: '', syncSourceHost: '', syncSourceId: -1, infoMessage: '', configVersion: 19, configTerm: 47 }, { _id: 1, name: 'p2:7253', health: 1, state: 1, stateStr: 'PRIMARY', uptime: 18320, optime: \{ ts: Timestamp({ t: 1693470388, i: 2 }), t: Long("50") }, optimeDurable: \{ ts: Timestamp({ t: 1693470388, i: 2 }), t: Long("50") }, optimeDate: ISODate("2023-08-31T08:26:28.000Z"), optimeDurableDate: ISODate("2023-08-31T08:26:28.000Z"), lastAppliedWallTime: ISODate("2023-08-31T08:26:28.836Z"), lastDurableWallTime: ISODate("2023-08-31T08:26:28.836Z"), lastHeartbeat: ISODate("2023-08-31T08:26:36.658Z"), lastHeartbeatRecv: ISODate("2023-08-31T08:26:37.492Z"), pingMs: Long("82"), lastHeartbeatMessage: '', syncSourceHost: '', syncSourceId: -1, infoMessage: '', electionTime: Timestamp(\{ t: 1693206944, i: 1 }), electionDate: ISODate("2023-08-28T07:15:44.000Z"), configVersion: 19, configTerm: 50 }, { _id: 2, name: 'dr:7253', health: 1, state: 2, stateStr: 'SECONDARY', uptime: 18334, optime: \{ ts: Timestamp({ t: 1693470388, i: 2 }), t: Long("50") }, optimeDate: ISODate("2023-08-31T08:26:28.000Z"), lastAppliedWallTime: ISODate("2023-08-31T08:26:28.836Z"), lastDurableWallTime: ISODate("2023-08-31T08:26:28.836Z"), syncSourceHost: 'p2:7253', syncSourceId: 1, infoMessage: '', configVersion: 19, configTerm: 50, self: true, lastHeartbeatMessage: '' },
mongo7253 [direct: secondary] test> rs.printSecondaryReplicationInfo()
source: p1:7253
—
source: dr:7253
mongo7253 [direct: secondary] test>
mongo7253 [direct: secondary] test> rs.printReplicationInfo()
actual oplog size
'2048 MB'
—
configured oplog size
'2048 MB'
—
log length start to end
'46229695 secs (12841.58 hrs)'
—
oplog first event time
'Mon Mar 14 2022 06:53:15 GMT+0000 (Coordinated Universal Time)'
—
oplog last event time
'Thu Aug 31 2023 08:28:10 GMT+0000 (Coordinated Universal Time)'
—
now
'Thu Aug 31 2023 08:28:12 GMT+0000 (Coordinated Universal Time)'
mongo7253 [direct: secondary] test>
=================
when I tail the log for REPL messages ;
On p1: keep getting these messages
tail -f /instances/mongo_db7253/logs/mongodb.log | grep REPL
,"s":"I", "c":"REPL", "id":21401, "ctx":"conn26379","msg":"Scheduling heartbeat to fetch a newer config","attr":{"configTerm":50,"configVersion":19,"senderHost":"dr:7253"}}
{"t":\{"$date":"2023-08-31T08:29:00.682+00:00"},"s":"I", "c":"REPL", "id":21401, "ctx":"conn2169","msg":"Scheduling heartbeat to fetch a newer config","attr":{"configTerm":50,"configVersion":19,"senderHost":"p2:7253"}}
On p2 & dr: I get nothing
tail -f /instances/mongo_db7253/logs/mongodb.log | grep REPL
however I get below
tail -f /instances/mongo_db7253/logs/mongodb.log | grep FlowControlRefresher
,"s":"W", "c":"STORAGE", "id":22225, "ctx":"FlowControlRefresher","msg":"Flow control is engaged and the sustainer point is not moving. Please check the health of all secondaries."}
===================
question number :
why is my p1 having negative lag number (73 hours)?
whether P1 primary or p2 ?
why p1 is primary because its ahead of other 2 nodes as per rs.printSecondaryReplicationInfo() on p2 and dr
why p2 is primary is because when connected locally using mongosh it shows as primary.
why p1 says - no primary available at the moment when ran rs.printSecondaryReplicationInfo() from p1
why p2 and dr state show as 'not reachable/healthy' when ran rs.status() on P1
but they show p1, DR as secondary and p2 as primary when ran rs.status() on p2 and DR
why p2 and DR show lastHeartbeatMessage: "Couldn't get a connection within the time limit" for when ran rs,satus() on p1 and shows healthy on p2 and dr.
Is p1 a primary as its trying to replicate to p2 and Dr but not happening and p2 is false primary ?
what's the bug with FlowControl to display message like
"Flow control is engaged and the sustainer point is not moving. Please check the health of all secondaries."
how to fix this replication issue ?
my mongodb server version is 5.0.5 on all 3 nodes.