Loading...

XML

Word

Printable

JSON

Type: Improvement
Resolution: Done
Priority: Major - P3
Fix Version/s: 2.6.2, 2.7.1
Affects Version/s: 2.4.6
Component/s: Performance, Sharding
Labels:
None
Environment:
sharded cluster, 3 config servers, auth

Backport Completed:

2.6.2
Confidence Status:
None
Work Order:
3

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name:
None
Goal Link:
None

Issue Status as of May 14, 2014

ISSUE SUMMARY
For MongoDB sharded clusters with authentication enabled, authentication requests on new connections can query the first config server if authentication data is not already cached. If this config server is unresponsive, there is a 30 second timeout after which the next config server is contacted. These long 30-second timeouts sometimes cause delays on new connections, manifesting as slow queries or other operations. An internal internalSCCAllowFastestAuthConfigReads mongos server parameter was added to enable reading authentication data from the first-to-respond config server.

USER IMPACT
In authenticated environments, when the first config server becomes unresponsive (note: this is different from the config server shutting down as connections would then fail immediately) and authentication data is not cached, queries and other operations can be delayed by up to 30 seconds.

WORKAROUNDS
The preferred workaround is to block the first config server using a firewall (e.g. with iptables) to make connections to it fail immediately. In this case, the second config server is contacted without the 30-second delay. If this is not possible, the internal mongos parameter internalSCCAllowFastestAuthConfigReads can be used to workaround the issue.

AFFECTED VERSIONS
All previous versions are affected by this issue.

FIX VERSION
The fix is included in the 2.6.2 production release.

RESOLUTION DETAILS
For authentication requests (and only for those), a parameter internalSCCAllowFastestAuthConfigReads was added to allow all three config servers to be queried concurrently. To ensure consistent reads of all other metadata, all other requests use the normal mechanism of contacting the first config server, with a 30-second timeout.

Original description

Normal collection operations, do not touch config server.
But other things do.
Some examples:

authentication
splits/balancer
listDatabases
creating database
creating collection

Possible Solutions:

send reads to all (maybe with a tiny backoff), respond from first response (maybe with threshold) (preferred)
blacklist (a bit ugly + racy)

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

SERVER-11332 mongos verbose log.txt
9 kB
Dec 26 2013 07:31:51 AM UTC
SERVER-11332 reproduce notes.txt
17 kB
Dec 26 2013 07:31:51 AM UTC
sync_hung_cmd.js
2 kB
Jan 03 2014 12:14:28 AM UTC

is duplicated by

SERVER-13323 listDBs block when first mongo config server is down

Closed

SERVER-9916 be smarter about config server retries in non-responsive situations

Closed

Assignee:: Greg Studer (Inactive)
Reporter:: Alexander Komyagin (Inactive)
Participants:: Alexander Komyagin, Asya Kamsky, Githook User, Greg Studer, Henrik Ingo
Votes:: 6 Vote for this issue
Watchers:: 21 Start watching this issue

Created:: Oct 23 2013 03:48:27 PM UTC
Updated:: Jul 11 2016 05:40:32 PM UTC
Resolved:: May 22 2014 10:04:17 PM UTC

Details

Description

Original description

Attachments

Attachments

Issue Links

Activity

People

Dates