Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-60782

ThreadPool::waitForIdle should not be called concurrently with shutdown in TenantOplogApplier shutdown

    • Type: Icon: Bug Bug
    • Resolution: Fixed
    • Priority: Icon: Major - P3 Major - P3
    • 5.2.0, 5.1.0-rc3
    • Affects Version/s: None
    • Component/s: None
    • None
    • Fully Compatible
    • ALL
    • v5.1
    • Repl 2021-11-01
    • 21

      waitForIdle was ported from the old ThreadPool when the current one was written to serve the needs of a few highly-constrained callers. The current ThreadPool was not designed to have waitForIdle called concurrently with shutdown/a shutting-down thread pool; to use waitForIdle safely the threadPool must remain in an un-shut-down state from the time waitForIdle is called until the pool idles (and waitForIdle returns).

      We considered updating the ThreadPool to give waitForIdle a safe-contract with regards to concurrent shutdown, but it would require non-trivial changes to the ThreadPool bookeeping internals that are somewhat high-risk because of the widespread risk of the ThreadPool. Additionally, waitForIdle has only 3 non-test only users: the TenantOplogApplier, the old OplogApplierImpl, and the DeferredWriter, and we plan to deprecate the current waitForIdle API in favor of a barrier-based approach. So we've decided not to change the ThreadPool internals and instead ensure all callers follow the above safety guarantee for waitForIdle.

      Currently, the TenantOplogApplier is the only piece of code that may call waitForIdle on a shutting-down or shut-down thread pool. This is unsafe and may lead to hangs. Instead, it would be better to follow the pattern in the old OplogApplierImpl, which joins any threads that may call waitForIdle before shutting down the thread pool, guaranteeing that waitForIdle cannot be called concurrently with shutdown or on a shut-down thread pool.  This ticket tracks modifying TenantOplogApplier shutdown to use the safe pattern.

      (In SERVER-60444 SA will update the comments in the code around waitForIdle to document this safety guarantee; apologies for not doing so earlier and thanks for your help with this!)

       

            Assignee:
            lingzhi.deng@mongodb.com Lingzhi Deng
            Reporter:
            george.wangensteen@mongodb.com George Wangensteen
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

              Created:
              Updated:
              Resolved: