Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-76854

Revisit _configsvrSetAllowMigrations command use of sessions

    • Type: Icon: Bug Bug
    • Resolution: Duplicate
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • None
    • Sharding EMEA
    • ALL

      The command should either:
      (a) Not have any sessions attached to it.
      (b) Not have the session checked out while running a blocking network call.
      (c) Have a timeout so it is guaranteed to check back in the current session.

      In the config shard setup, the following deadlock can occur:

      shardA (also config server)
      shardB

      1. moveChunk from shardB to shardA.
      2. shardA: Some ddl op calls sharding_ddl_util::stopMigrations. For example, in renameCollection, a session X is attached with the _configsvrSetAllowMigrations it sends out to the config server.
      3. shardA (also config server): session X is checked out while running _configsvrSetAllowMigrations.
      4. shardA: during session migration the destination encounters a session with id X, and tries to check it out, but is blocked because of _configsvrSetAllowMigrations.
      5. shardA: _configsvrSetAllowMigrations calls _flushRoutingTableCacheUpdatesWithWriteConcern to all shards.
      6. shardB: _flushRoutingTableCacheUpdatesWithWriteConcern waits for migration source to finish (via recoverRefresh -> wait for migration abort future)
      7. shardB: as part of abort, it waits for _recvChunkReleaseCritSec to succeed. Since session migration is still ongoing on the destination, it will always return an error. But shardA is stuck because session migration is blocked waiting for _configsvrSetAllowMigrations to release the session.

            Assignee:
            backlog-server-sharding-emea [DO NOT USE] Backlog - Sharding EMEA
            Reporter:
            randolph@mongodb.com Randolph Tan
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              Resolved: