Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Done
Priority: Major - P3
Fix Version/s: 3.0.9, 3.2.3, 3.3.1
Affects Version/s: None
Component/s: Sharding
Labels:
- code-only

Backwards Compatibility:
Fully Compatible
Operating System:
ALL
Backport Completed:

3.0.9, 3.2.3
Sprint:
Sharding F (01/29/16)
Confidence Status:
None
Work Order:
3

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name:
None
Goal Link:
None

When a shard restarts, it loses all its sharding metadata information. So if a mongos that has sharding information sends a write command to the restarted shard, it will get a stale config error from the shard because the shard contains no shard version (see note 1). The mongos will see the stale error with shard zero version from the response and will decide to perform a full reload (see note 2). In setups with very huge number of chunks (in millions), it takes time for the entire chunk metadata to be loaded and this can cause issues because this is done under a mutex, and will cause other threads which got the same response from the shard to decide to perform a full reload as well. This is exacerbated by the fact that the thread will have to acquire and release the same contentious mutex multiple times until it finishes, so it can hold on to the newly fully loaded data for even longer periods of time. Also note that for every full reload, it will create a new instance of ChunkManager and will try to atomically replace the old one with the new one when the reload finishes. In a mongos with multiple threads trying to execute a write command, it can create a situation where several threads will queue up trying to perform a full reload and some threads have loaded their own copy of the chunk metadata but are blocked waiting for the same mutex the other threads are waiting for the full reload. In certain cases with large enough chunks and simultaneous write command operations, it can spiral out of control, consume too much memory and ultimately get killed by the OOM killer in the operating system.

Note 1:
In mongod write command execution path, version is checked first inside here:
https://github.com/mongodb/mongo/blob/r3.2.0/src/mongo/db/commands/write_commands/batch_executor.cpp#L321

sets the error in the response, and then performs a refresh afterwards:
https://github.com/mongodb/mongo/blob/r3.2.0/src/mongo/db/commands/write_commands/batch_executor.cpp#L361

Note 2:
Shard response with zero version will result to unknown comparison result:
https://github.com/mongodb/mongo/blob/r3.2.0/src/mongo/s/chunk_manager_targeter.cpp#L152

and ultimately, causing it to flush the entire chunk manager:
https://github.com/mongodb/mongo/blob/r3.2.0/src/mongo/s/chunk_manager_targeter.cpp#L674

related to

SERVER-23958 DBConfig::_loadIfNeeded will not do a reload even in cases where a force reload is needed

Closed

SERVER-23965 DBConfig::getChunkManager should not need to reload the entire DBConfig if it needs to reload one ns

Closed

Assignee:: Randolph Tan
Reporter:: Randolph Tan
Participants:: Githook User, Randolph Tan
Votes:: 0 Vote for this issue
Watchers:: 12 Start watching this issue

Created:: Jan 09 2016 02:01:20 AM UTC
Updated:: Jan 25 2017 09:59:27 PM UTC
Resolved:: Jan 15 2016 10:09:55 PM UTC
Confidence Status Last Update:: 11/Jan/16 8:38 PM

Details

Description

Attachments

Issue Links

Activity

People

Dates