-
Type: Bug
-
Resolution: Fixed
-
Priority: Major - P3
-
Affects Version/s: None
-
Component/s: None
-
None
-
Fully Compatible
-
ALL
-
Execution Team 2019-11-18
-
13
Secondaries serialize all oplog commands, which means that the code in startIndexBuild to 1) write the "startIndexBuild" oplog entry and 2) schedule the task on the thread pool cannot race with other threads doing the same thing.
On primares, however, these two operations are not protected from being concurrent, so it would be possible to have two concurrent threads interleave. This leads to a situation described below where the thread pool size is only 1:
- Start and replicate a "startIndexBuild" oplog entry for index A
- The secondary starts building index A
- Start and replicate a "startIndexBuild" oplog entry for index B
- Schedule index build B on the thread pool on the primary
- The primary starts building index B
- Queue up index build B on the primary because all threads are in use, and block.
- Commit and replicate "commitIndexBuild" for index B
- The secondary attempts to apply this oplog entry and blocks because index B has not started
- Index B cannot start until index A commits
- Index A cannot commit until it replicates the commitIndexBuild oplog entry, leading to a deadlock scenario.
The following original description does not accurately describe the full problem:
We limit the maximum number of index build worker threads to 10, but there is no high-level restriction on the number of active index build threads.
- When a task is scheduled, it is first added to the queue of _pendingTasks.
- If an index build is scheduled and the maximum number of workers is already active, a new thread is not scheduled and the task is left in the queue.
This is problematic for secondaries in the following scenario:
- Start, but do not commit 10 index builds on the primary, replicating 10 "startIndexBuild" oplog entries and starting 10 worker threads.
- Start and commit an 11th index build on the primary, replicating a "startIndexBuild" and "commitIndexBuild" oplog entry.
- Because there are already 10 index builds active on the secondary, this index build will queue up in "_pendingTasks", but it will not start.
- Replication of the "commitIndexBuild" oplog entry will wait for the 11th index build's thread to join, blocking until it does.
- This in turn blocks other "commitIndexBuild" oplog entries from joining other index build threads, causing this hang.
We should do one of the following:
- Limit the maximum number of active index builds allowed on the primary
- This should be the same as the maximum number of worker threads. We would enforce this by either returning an error to the user, or just block until resources are avialable. This would prevent the problem on secondaries as long as the limits are identical, otherwise this would not work.
- Do not limit the maximum number of index build worker threads
- is depended on by
-
SERVER-43692 enable two phase index builds by default
- Closed
- is related to
-
SERVER-44609 Replicate startIndexBuild oplog entry in the same thread as the index build.
- Closed
- related to
-
SERVER-45262 make IndexBuildsCoordinator thread pool configurable via startup parameter
- Closed
-
SERVER-74953 Explore avoiding stepdowns during the early phases of index build setup
- Closed