-
Type: Bug
-
Resolution: Done
-
Priority: Major - P3
-
None
-
Affects Version/s: 3.2.16
-
Component/s: None
-
None
-
Environment:MongoDB Version:
db version v3.2.16
git version: 056bf45128114e44c5358c7a8776fb582363e094
allocator: tcmalloc
modules: none
build environment:
distarch: x86_64
target_arch: x86_64
Operating System: Ubuntu 14.04.6 LTS
Linux Kernel Info: Linux N-Mongo-S20-1 4.4.0-93-generic #116~14.04.1-Ubuntu SMP Mon Aug 14 16:07:05 UTC 2017 x86_64 x86_64 x86_64 GNU/LinuxMongoDB Version: db version v3.2.16 git version: 056bf45128114e44c5358c7a8776fb582363e094 allocator: tcmalloc modules: none build environment: distarch: x86_64 target_arch: x86_64 Operating System: Ubuntu 14.04.6 LTS Linux Kernel Info: Linux N-Mongo-S20-1 4.4.0-93-generic #116~14.04.1-Ubuntu SMP Mon Aug 14 16:07:05 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
-
ALL
-
I have a MongoDB 3.2.16 sharded cluster with 20 shards, each shard have 1 primary, 1 secondary, and 1 arbiter node. Due to a network outage in one of the cloud provider's availability zones, 5 of the shard primaries lost connectivity, but during the 30-minute outage, none of these 5 shards triggered a re-election.
I checked the logs of the 3 nodes in each of the affected shards: * The primary node logs mainly show connection failures to other mongo nodes, with no other obvious issues.
- The secondary node logs do not have any election-related logs, and even during the network outage, the logs occasionally show successful connections to the primary.
- The arbiter node continuously prints heartbeat request timeout failures to the primary, but there are no election-related logs, for example:
2024-07-02T10:04:59.095+0800 I ASIO [NetworkInterfaceASIO-Replication-0] Ending connection to host N-Mongo-S20-1:27017 due to bad connection status; 0 connections to that host remain open 2024-07-02T10:04:59.095+0800 I REPL [ReplicationExecutor] Error in heartbeat request to N-Mongo-S20-1:27017; ExceededTimeLimit: Operation timed out 2024-07-02T10:05:04.095+0800 I ASIO [NetworkInterfaceASIO-Replication-0] Connecting to N-Mongo-S20-1:27017 2024-07-02T10:05:14.096+0800 I REPL [ReplicationExecutor] Error in heartbeat request to N-Mongo-S20-1:27017; ExceededTimeLimit: Couldn't get a connection within the time limit 2024-07-02T10:05:24.096+0800 I ASIO [NetworkInterfaceASIO-Replication-0] Failed to connect to N-Mongo-S20-1:27017 - ExceededTimeLimit: Operation timed out
Is this a bug related to the MongoDB version, or is there another reason causing the lack of election trigger? What additional information should I provide to help investigate this issue?
Any advice or guidance would be greatly appreciated. Thank you!