Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-44250

startIndexBuild oplog write and thread pool scheduling are not serialized between concurrent threads on primaries

    • Type: Icon: Bug Bug
    • Resolution: Fixed
    • Priority: Icon: Major - P3 Major - P3
    • 4.3.2
    • Affects Version/s: None
    • Component/s: None
    • None
    • Fully Compatible
    • ALL
    • Execution Team 2019-11-18
    • 13

      Secondaries serialize all oplog commands, which means that the code in startIndexBuild  to 1) write the "startIndexBuild" oplog entry and 2) schedule the task on the thread pool cannot race with other threads doing the same thing.

      On primares, however, these two operations are not protected from being concurrent, so it would be possible to have two concurrent threads interleave. This leads to a situation described below where the thread pool size is only 1:

      • Start and replicate a "startIndexBuild" oplog entry for index A
        • The secondary starts building index A
      • Start and replicate a "startIndexBuild" oplog entry for index B
      • Schedule index build B on the thread pool on the primary
        • The primary starts building index B
      • Queue up index build B on the primary because all threads are in use, and block.
      • Commit and replicate "commitIndexBuild" for index B
      • The secondary attempts to apply this oplog entry and blocks because index B has not started
        • Index B cannot start until index A commits
        • Index A cannot commit until it replicates the commitIndexBuild oplog entry, leading to a deadlock scenario.

       

      The following original description does not accurately describe the full problem:

      We limit the maximum number of index build worker threads to 10, but there is no high-level restriction on the number of active index build threads.

      This is problematic for secondaries in the following scenario:

      • Start, but do not commit 10 index builds on the primary, replicating 10 "startIndexBuild" oplog entries and starting 10 worker threads.
      • Start and commit an 11th index build on the primary, replicating a "startIndexBuild" and "commitIndexBuild" oplog entry.
        • Because there are already 10 index builds active on the secondary, this index build will queue up in "_pendingTasks", but it will not start.
      • Replication of the "commitIndexBuild" oplog entry will wait for the 11th index build's thread to join, blocking until it does.
        • This in turn blocks other "commitIndexBuild" oplog entries from joining other index build threads, causing this hang.

      We should do one of the following:

      • Limit the maximum number of active index builds allowed on the primary
        • This should be the same as the maximum number of worker threads. We would enforce this by either returning an error to the user, or just block until resources are avialable. This would prevent the problem on secondaries as long as the limits are identical, otherwise this would not work.
      • Do not limit the maximum number of index build worker threads

            Assignee:
            louis.williams@mongodb.com Louis Williams
            Reporter:
            louis.williams@mongodb.com Louis Williams
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              Resolved: