-
Type: Bug
-
Resolution: Fixed
-
Priority: Major - P3
-
Affects Version/s: None
-
Component/s: None
-
Fully Compatible
-
ALL
-
Repl 2020-12-28
The header file claims that it is legal to call waitForIdle before shutdown is called. But it is not. On shutdown, the thread pool will drain all pending tasks and shut down all threads, after which the _numIdleThreads will become 0. So if we call shutdown, then waitForIdle would hang because _numIdleThreads (0) would be < the size of _thread (until join is called).
However, in tenant migration, we call _writerPool->shutdown() without join on interrupt and rely on the _tenantOplogApplier to be able interrupt itself based on interrupt errors. And we only join after all components have been interrupted in the last clean up stage. So that means if _tenantOplogApplier is at waitForIdle, it will hang and fail to shut down even if we already call shutdown on the _writerPool.
In fact, I don't think we need to shutdown the _writerPool when interrupting a recipient instance. We can have the oplog applier finish applying the current batch. And if the oplog applier is able to finish applying the current batch, it will stop on hitting _shouldStopApplying. Or if we get errors applying the current batch due to shutdown/stepdown, the oplog applier will also exit. So shutting down _writerPool during interrupt is unnecessary. And not shutting down _writerPool would also work around the bug mentioned above.
- is depended on by
-
SERVER-53312 Enable recipient testing for tenant_migration_jscore_passthrough
- Closed
- is related to
-
SERVER-53477 ThreadPool::waitForIdle should be interruptible on thread pool shutdown()
- Closed