Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-53410

No need to shutdown writerPool when interrupting recipient service instances

    • Type: Icon: Bug Bug
    • Resolution: Fixed
    • Priority: Icon: Major - P3 Major - P3
    • 4.9.0
    • Affects Version/s: None
    • Component/s: None
    • Fully Compatible
    • ALL
    • Repl 2020-12-28

      The header file claims that it is legal to call waitForIdle before shutdown is called. But it is not. On shutdown, the thread pool will drain all pending tasks and shut down all threads, after which the _numIdleThreads will become 0. So if we call shutdown, then waitForIdle would hang because _numIdleThreads (0) would be < the size of _thread (until join is called).

      However, in tenant migration, we call _writerPool->shutdown() without join on interrupt and rely on the _tenantOplogApplier to be able interrupt itself based on interrupt errors. And we only join after all components have been interrupted in the last clean up stage. So that means if _tenantOplogApplier is at waitForIdle, it will hang and fail to shut down even if we already call shutdown on the _writerPool.

      In fact, I don't think we need to shutdown the _writerPool when interrupting a recipient instance. We can have the oplog applier finish applying the current batch. And if the oplog applier is able to finish applying the current batch, it will stop on hitting _shouldStopApplying. Or if we get errors applying the current batch due to shutdown/stepdown, the oplog applier will also exit. So shutting down _writerPool during interrupt is unnecessary. And not shutting down _writerPool would also work around the bug mentioned above.

            Assignee:
            lingzhi.deng@mongodb.com Lingzhi Deng
            Reporter:
            lingzhi.deng@mongodb.com Lingzhi Deng
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: