Currently, When we run 2 phase index build with maxTimeMS, it can lead to below undesirable behaviors.
Server crash due to fassert:
When primary gets createIndex cmd with maxTimeMS set, we start building index with maxTimeMS/deadline set on that indexBuildCoordinator thread opCtx.
Now, assume, index builder thread tries to acquire a lock whose opCtx deadline got expired. This would result in throwing with ErrorCodes::MaxTimeMSExpired. On error, index builder would try to clean up the index build. Now, assume, the primary stepped down. This means, the clean up code path for secondary will be executed and hitting this fassert. Because, on secondaries it's illegal for index builder thread to throw some error unless it's asked to get aborted by abortIndexBuild oplog entry (exceptional is shutdown).
Note: repro patch attached for this scenario.
Server crash due to invariant:
There is also ways it can lead to issue like SERVER-45921. Think of a case, where the createIndex parent thread opCtx got interrupted due to ErrorCodes::MaxTimeMSExpired. This makes the parent thread to signal index builder thread to abort. Assume, step down happened. This means, we get into clean up code path for secondary for ErrorCodes::IndexBuildAborted which will tear down the index build. As a result, new fail-over (previous secondary) node will take the responsibility of committing the index build that got aborted on previous primary. So, when the old primary get the commitIndexBuildOplog entry, we would try to restart the index build and lead to an invariant failure.
- is related to
-
SERVER-37643 add createIndexes command logic to the index build interface
- Closed
- related to
-
SERVER-44953 Secondaries should restart index builds when a commitIndexBuild oplog entry is processed but no index build is active
- Closed
-
SERVER-45378 IndexBuildsCoordinator::_setUpIndexBuild() can throw exceptions which should be caught and clean (unregister) the index build
- Closed
-
SERVER-45921 Index builder invariants on this check (indexSpecs.size() > 1) while trying to start building index.
- Closed
-
SERVER-73164 increase maxTimeMS for index build in index_max_time_ms.js
- Closed