Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Done
Priority: Major - P3
Fix Version/s: 2.2.2, 2.3.1
Affects Version/s: 2.2.0
Component/s: Sharding
Labels:
None

Backwards Compatibility:
Fully Compatible
Operating System:
ALL
Confidence Status:
None
Work Order:
3

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name:
None
Goal Link:
None

If a migration aborts it calls the done() method on MigrateFromStatus, which is what takes that server out of the critical section. That method, however, tries to acquire the database read lock on the database for which the migration is taking place. While in the critical section, however, all requests on that collection hang in running setShardVersion, which waits for the server to be out of the critical section. setShardVersion, however, takes the database's write-lock. So if you have a lot of queries coming in to that namespace on a lot of different threads, all the setShardVersion commands can cause read starvation on the database lock, preventing the migration from ever finishing.

Proposed fix is to change MigrateFromStatus::done to use a write lock rather than a read lock so that the lock acquisition will be greedy.

is related to

SERVER-7361 segfault in mongod after failed moveChunk

Closed

SERVER-8099 use condition instead of hard loop for SSV waiting for critical section to finish

Closed

related to

SERVER-7298 thousands of "waiting till out of critical section"

Closed

SERVER-7472 Replication lag can cause cluster to hang in migration critical section

Closed

Assignee:: Spencer Brody (Inactive)
Reporter:: Spencer Brody (Inactive)
Participants:: auto, Spencer Brody
Votes:: 0 Vote for this issue
Watchers:: 2 Start watching this issue

Created:: Oct 27 2012 05:57:06 PM UTC
Updated:: Jul 11 2016 05:58:21 PM UTC
Resolved:: Nov 16 2012 10:31:32 PM UTC

Details

Description

Attachments

Issue Links

Activity

People

Dates