-
Type: Improvement
-
Resolution: Done
-
Priority: Major - P3
-
Affects Version/s: 2.4.6
-
Component/s: Performance, Sharding
-
None
-
Environment:sharded cluster, 3 config servers, auth
ISSUE SUMMARY
For MongoDB sharded clusters with authentication enabled, authentication requests on new connections can query the first config server if authentication data is not already cached. If this config server is unresponsive, there is a 30 second timeout after which the next config server is contacted. These long 30-second timeouts sometimes cause delays on new connections, manifesting as slow queries or other operations. An internal internalSCCAllowFastestAuthConfigReads mongos server parameter was added to enable reading authentication data from the first-to-respond config server.
USER IMPACT
In authenticated environments, when the first config server becomes unresponsive (note: this is different from the config server shutting down as connections would then fail immediately) and authentication data is not cached, queries and other operations can be delayed by up to 30 seconds.
WORKAROUNDS
The preferred workaround is to block the first config server using a firewall (e.g. with iptables) to make connections to it fail immediately. In this case, the second config server is contacted without the 30-second delay. If this is not possible, the internal mongos parameter internalSCCAllowFastestAuthConfigReads can be used to workaround the issue.
AFFECTED VERSIONS
All previous versions are affected by this issue.
FIX VERSION
The fix is included in the 2.6.2 production release.
RESOLUTION DETAILS
For authentication requests (and only for those), a parameter internalSCCAllowFastestAuthConfigReads was added to allow all three config servers to be queried concurrently. To ensure consistent reads of all other metadata, all other requests use the normal mechanism of contacting the first config server, with a 30-second timeout.
Original description
Normal collection operations, do not touch config server.
But other things do.
Some examples:
- authentication
- splits/balancer
- listDatabases
- creating database
- creating collection
Possible Solutions:
- send reads to all (maybe with a tiny backoff), respond from first response (maybe with threshold) (preferred)
- blacklist (a bit ugly + racy)
- is duplicated by
-
SERVER-13323 listDBs block when first mongo config server is down
- Closed
-
SERVER-9916 be smarter about config server retries in non-responsive situations
- Closed