Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Fixed
Priority: Major - P3
Fix Version/s: 7.1.0-rc0
Affects Version/s: None
Component/s: Sharding
Labels:
None

Backwards Compatibility:
Fully Compatible
Operating System:
ALL
Sprint:
Sharding EMEA 2023-05-15
Linked BF Score:
110
Confidence Status:
None
Work Order:
3

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name:
None
Goal Link:
None

SERVER-73539 added replay protection to the setAllowMigrations command, however, this implies having a session checked out, while a refresh on all shards is happening. In the config shard setup, the following deadlock can occur:

shardA (also config server)
shardB

1. moveChunk from shardB to shardA.
2. shardA: Some ddl op calls sharding_ddl_util::stopMigrations. For example, in renameCollection, a session X is attached with the _configsvrSetAllowMigrations it sends out to the config server.
3. shardA (also config server): session X is checked out while running _configsvrSetAllowMigrations.
4. shardA: during session migration the destination encounters a session with id X, and tries to check it out, but is blocked because of _configsvrSetAllowMigrations.
5. shardA: _configsvrSetAllowMigrations calls _flushRoutingTableCacheUpdatesWithWriteConcern to all shards.
6. shardB: _flushRoutingTableCacheUpdatesWithWriteConcern waits for migration source to finish (via recoverRefresh -> wait for migration abort future)
7. shardB: as part of abort, it waits for _recvChunkReleaseCritSec to succeed. Since session migration is still ongoing on the destination, it will always return an error. But shardA is stuck because session migration is blocked waiting for _configsvrSetAllowMigrations to release the session.

We should do something similar to the transaction yielder, that is, yield the session while doing remote (or possible blocking) work.

is caused by

SERVER-73539 stopMigrations/resumeMigrations don't have replay protection

Closed

is duplicated by

SERVER-76854 Revisit _configsvrSetAllowMigrations command use of sessions

Closed

related to

SERVER-76720 Chunk Migration migrates the session history for the migrating session leading to a deadlock

Closed

Assignee:: Marcos José Grillo Ramirez
Reporter:: Marcos José Grillo Ramirez
Participants:: Githook User, Marcos José Grillo Ramirez
Votes:: 0 Vote for this issue
Watchers:: 4 Start watching this issue

Created:: May 04 2023 12:56:48 PM UTC
Updated:: Oct 29 2023 09:21:59 PM UTC
Resolved:: May 11 2023 10:01:28 AM UTC
Confidence Status Last Update:: 04/May/23 4:56 PM

Details

Description

Attachments

Issue Links

Activity

People

Dates