ISSUE SUMMARY
If a runner performing a chunk migration cleanup yields, and during that time the node becomes non-primary, when the cleanup resumes the runner assumes the node is still primary and incorrectly attempts to write to the oplog, causing a fatal assertion.
The only configurations affected by this issue are sharded clusters where shards are replica sets, the balancer is enabled, and chunk migrations have occurred.
USER IMPACT
Under the conditions described above, the cleanup operation fails with an assert, and the primary node shuts down.
WORKAROUNDS
N/A
AFFECTED VERSIONS
MongoDB 2.6 production releases up to 2.6.3 are affected by this issue.
FIX VERSION
The fix is included in the 2.6.4 production release.
RESOLUTION DETAILS
During cleanup, always check the replica set status after yielding and abort the cleanup operation if the node is no longer primary.
Original description
The removeRange helper used by migration cleanup does not re-check replica set state after using a YIELD_AUTO cursor - if yielding and stepdown occurs, logOp() will fail (correctly) with an fassert().
We need to either not yield or re-check replica set state before deleting the document.
Affects v2.4, does not affect v2.7 due to changes in yield behavior.
- related to
-
SERVER-15798 Helpers::removeRange does not check if node is primary
- Closed
-
SERVER-14261 stepdown during migration range delete can abort mongod
- Closed
-
SERVER-16115 Helpers::removeRange should check if master
- Closed