Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Won't Fix
Priority: Major - P3
Fix Version/s: None
Affects Version/s: None
Component/s: Sharding
Labels:
- PM229
- balancer

Assigned Teams:

Sharding
Operating System:
ALL
Confidence Status:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name:
None
Goal Link:
None

sh.startBalancer() calls updates the balancer document in config.settings, followed by watching for a change in the timestamp of the balancer lock. The sequence is:

setBalancerState to true
Get balancer lock document
Extract the timestamp
assert.soon (up to 30s by default) watching for the timestamp to change

If the balancer manages to start and take the lock between #1 and #2, and starts doing a non-trivial chunk migration, then the timeout will occur. By contrast, if the balancer is slower the lock is taken after #2, then this will not happen, even if there is a non-trivial chunk migration.

The impact of this is low, although the apparent failure of sh.startBalancer() (despite the balancer clearly working) is often confusing (and there are other reasons it can happen).

Better would be to grab the lock document before enabling the balancer, and then pass it through to the (eventual) assert.soon.

sh.stopBalancer() doesn't have this problem, because it waits for the balancer lock state to go to false.

is related to

SERVER-21766 Remove waiting for balancer lock behavior from sh.startBalancer

Closed

Assignee:: [DO NOT USE] Backlog - Sharding Team

Reporter:: Kevin Pulo

Participants:: [DO NOT USE] Backlog - Sharding Team, Kevin Pulo

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Created:: Jul 17 2015 04:09:23 AM UTC

Updated:: Dec 06 2022 04:47:56 AM UTC

Resolved:: Nov 30 2016 08:46:45 PM UTC

GA Target Date:: None

Public Preview Target Date:: None

Private Preview Target Date:: None

Experiment Target Date:: None

Details

Description

Attachments

Issue Links

Activity

People

Dates