Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-46664

runCmdOnPrimaryAndAwaitResponse() should not run DBDirect client command with the rstl lock held.

    • Type: Icon: Bug Bug
    • Resolution: Fixed
    • Priority: Icon: Major - P3 Major - P3
    • 4.4.0-rc0, 4.7.0
    • Affects Version/s: None
    • Component/s: Storage
    • None
    • Storage Execution
    • Fully Compatible
    • ALL
    • v4.4
    • Execution Team 2020-03-09
    • 50

      Currently runCmdOnPrimaryAndAwaitResponse() takes RSTL lock in IX mode and performs DBDirectClient command using AlternativeClientRegion. If the DBDirectClient command takes RSTL lock in the AlternativeClientRegion's opCtx, then it can lead to deadlock involving the stepdown thread and the thread that calls runCmdOnPrimaryAndAwaitResponse().
      The reason for the deadlock is that, the stepdown thread can't acquire RSTL lock as the caller's original opCtx (including locks) has been stashed and replaced with a new opCtx by runCmdOnPrimaryAndAwaitResponse() using AlternativeClientRegion class. So, no way stepdown can interrupt and make the original opCtx to release the locks. As a result, StepDown thread gets blocked behind runCmdOnPrimaryAndAwaitResponse() RSTL lock due to lock conflict . DBDirectClient command gets blocked behind stepdown as stepdown has enqueued the RSTL lock in X mode. And, runCmdOnPrimaryAndAwaitResponse() will be waiting for the DBDirectClient command's response.

            Assignee:
            backlog-server-execution [DO NOT USE] Backlog - Storage Execution Team
            Reporter:
            suganthi.mani@mongodb.com Suganthi Mani
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated:
              Resolved: