Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-14551

Runner yield during migration cleanup (removeRange) results in fassert

    • Type: Icon: Bug Bug
    • Resolution: Done
    • Priority: Icon: Major - P3 Major - P3
    • 2.6.4
    • Affects Version/s: 2.4.10, 2.6.4
    • Component/s: Sharding
    • None
    • ALL
    • Hide

      Trigger a yield and stepdown during a migration cleanup.

      Show
      Trigger a yield and stepdown during a migration cleanup.

      Issue Status as of Aug 6, 2014

      ISSUE SUMMARY
      If a runner performing a chunk migration cleanup yields, and during that time the node becomes non-primary, when the cleanup resumes the runner assumes the node is still primary and incorrectly attempts to write to the oplog, causing a fatal assertion.

      The only configurations affected by this issue are sharded clusters where shards are replica sets, the balancer is enabled, and chunk migrations have occurred.

      USER IMPACT
      Under the conditions described above, the cleanup operation fails with an assert, and the primary node shuts down.

      WORKAROUNDS
      N/A

      AFFECTED VERSIONS
      MongoDB 2.6 production releases up to 2.6.3 are affected by this issue.

      FIX VERSION
      The fix is included in the 2.6.4 production release.

      RESOLUTION DETAILS
      During cleanup, always check the replica set status after yielding and abort the cleanup operation if the node is no longer primary.

      Original description

      The removeRange helper used by migration cleanup does not re-check replica set state after using a YIELD_AUTO cursor - if yielding and stepdown occurs, logOp() will fail (correctly) with an fassert().

      We need to either not yield or re-check replica set state before deleting the document.

      Affects v2.4, does not affect v2.7 due to changes in yield behavior.

            Assignee:
            randolph@mongodb.com Randolph Tan
            Reporter:
            greg_10gen Greg Studer
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              Resolved: