-
Type: Bug
-
Resolution: Done
-
Priority: Major - P3
-
None
-
Affects Version/s: 3.2.12, 3.4.2, 3.5.2
-
Component/s: Replication
-
Environment:ubuntu, mongo 3.2.10
-
Replication
-
ALL
A secondary is failing to perform the initial sync with another secondary to join a replica set.
It fails due to a socket receive timeout when talking to the other secondary during the initial sync.
I have attached the final lines of the log from the secondary trying to join the replica set.
NB: we never see any "network problem detected" lines in our logs, so it seems as if there is never any retries:
https://github.com/mongodb/mongo/blob/r3.2.10/src/mongo/db/repl/sync_tail.cpp#L968-L969
I think the SocketException due to the timeout is being caught earlier:
https://github.com/mongodb/mongo/blob/r3.2.10/src/mongo/util/net/message_port.cpp#L204-L210
which then triggers the assertion exception
https://github.com/mongodb/mongo/blob/r3.2.10/src/mongo/client/dbclient.cpp#L811-L814
I do not believe the fix in https://jira.mongodb.org/browse/SERVER-9528 was correct due to the exception swallowing.
- depends on
-
SERVER-42022 Attempt to remove initial sync missing document fetching
- Closed
- is related to
-
SERVER-27950 Add SocketException to the list of NetworkErrors
- Closed