Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Duplicate
Priority: Critical - P2
Fix Version/s: None
Affects Version/s: None
Component/s: Sharding
Labels:
None

Operating System:
ALL
Confidence Status:
None
Work Order:
0

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name:
None
Goal Link:
None

We had the following issue on our production environment today:

Due to a mistake, a mongod process needed to be restarted. This caused the secondary member of the replica set to failover to primary.
However, after the freshly restarted mongod came back up, another election was held and it was re-elected primary.

From that point on, it was no longer possible to query a non-sharded DB that resides on the replica set that experienced the restart.
Connecting to mongos and trying to query the database returned the following error in mongo shell:
[code]
mongos> db.collection.find()
error:

{ "$err" : "socket exception", "code" : 9001 }

[code]

After manually retrying the query by repeating the command over and over (between 20-40 times) in mongo shell, the situation eventually cleared up and queries worked normally again, both from the shell as well as from our application. Unfortunately, this process needed to be repeated for every mongos-instance on the cluster, which is six in total.

It looks to me as if mongos does not check connections to the cluster's other members before using them.
Is it possible to add that functionality?
It wouldn't need to check before every use of the connection (though that behaviour might be desirable in some cases, same way it works for connecting to SQL databases from Java using JDBC connection pools), but the administrator shouldn't need to have to manually sort through.

Or is it already there and we just haven't seen the switch for it, yet?

duplicates

SERVER-4706 when a socket between mongos and mongod fails, close all connections immediately

Closed

related to

SERVER-9041 proactively detect broken connections detected by the network

Closed

Assignee:: Unassigned
Reporter:: Christian Tonhäuser
Participants:: Christian Tonhäuser, Eliot Horowitz
Votes:: 0 Vote for this issue
Watchers:: 2 Start watching this issue

Created:: Feb 17 2012 12:23:16 PM UTC
Updated:: Apr 06 2023 06:55:20 PM UTC
Resolved:: Feb 19 2012 04:25:27 AM UTC

Details

Description

Attachments

Issue Links

Activity

People

Dates