Loading...

XML

Word

Printable

JSON

Type: Improvement
Resolution: Gone away
Priority: Unknown
Fix Version/s: None
Affects Version/s: 4.4.2
Component/s: Cluster Management
Labels:
- external-user

Confidence Status:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name:
None
Goal Link:
None

Summary

Driver is unable to reconnect to primary after replica set is restarted as 'fresh'. In the driver, maxSetVersion and maxElectionId are stored in memory, but when replica set is shut down and restarted from scratch (with new data directories), elections are also done from scratch and no longer comparable to the ones stored in driver.

Driver version: 4.4.2

Mongo version: 5.0.3

Topology: replica set, 3 members, 1 primary, 2 secondaries

How to Reproduce

1. Setup a Java client connected to a mongo replica set.

2. Trigger a few elections to overwrite maxElectionId.

3. Shutdown the replica set and wipe out all the data.

4. Start up the replica set again.

5. Java app will not be able to reconnect to primary and perform writes (also reads if readPreference is primary).

Additional Background

In my setup, I have a Java application connected to a mongo replica set with 3 members (primary, secondary, secondary). I want to test a Disaster-Recovery scenario, so I shutdown the replica set and wipe out all the data. Then I start up the replica set from scratch and restore the data from backups. After that, the still-running Java app is unable to reconnect to primary to perform write operations.

The exceptions that are thrown look like this:

com.mongodb.MongoTimeoutException: Timed out after 30000 ms while waiting for a server that matches WritableServerSelector. Client view of cluster state is {type=REPLICA_SET, servers=[{address=mongo1.host:27017, type=UNKNOWN, state=CONNECTING}, {address=mongo2.host:27017, type=REPLICA_SET_SECONDARY, roundTripTime=0.9 ms, state=CONNECTED}, {address=mongo3.host:27017, type=REPLICA_SET_SECONDARY, roundTripTime=1.5 ms, state=CONNECTED}]

But the underlying issue is this:

org.mongodb.driver.cluster: Invalidating potential primary mongo1.host:27017 whose (set version, election id) tuple of (5, 7fffffff0000000000000002) is less than one already seen of (13, 7fffffff0000000000000013)

So it seems that the driver is unable to connect to the 'new' primary, because it claims that it has seen a primary with higher electionId, but in the meantime the whole replica set was restarted and elections were done from scratch.

related to

JAVA-4375 SDAM should give priority to electionId over setVersion when updating topology

Closed

Assignee:: Jeffrey Yemin

Reporter:: Tymoteusz Machowski

Reviewers:: None

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Created:: Mar 16 2022 10:59:05 AM UTC

Updated:: Oct 27 2023 07:48:32 PM UTC

Resolved:: Apr 13 2022 12:00:34 PM UTC

Details

Description

Summary

How to Reproduce

Additional Background

Attachments

Issue Links

Activity

People

Dates