-
Type: Bug
-
Resolution: Fixed
-
Priority: Major - P3
-
Affects Version/s: 7.1.0-rc0, 6.0.6, 5.0.17, 4.4.21, 7.0.0-rc1
-
Component/s: None
-
Sharding EMEA
-
Fully Compatible
-
v7.0, v6.3, v6.0, v5.0, v4.4
-
Sharding EMEA 2023-07-10, Sharding EMEA 2023-07-24, QI 2023-05-15
-
2
The implementation of $out created a special internal rename command (InternalRenameIfOptionsAndIndexesMatchCmd). However, this command implements its own locks to avoid concurrent modifications, but there is an error in the implementation. On this line there is a call to assertIsPrimaryShardForDb, but there is no guarantee this node will remain the primary through the entire execution of $out. The usual pattern to ensure the node remains a primary is:
- Wait for ShardingDDLCoordinator service recovery.
- Take database DDL lock to serialize with concurrent movePrimary operations that would change the db primary shard.
- Check if this shard is primary for the database.
- Acquire additional DDL locks if needed.
- Execute operation while holding the locks.
However, there is an existing _shardsvrRenameCollection command that already has the correct locking mechanism and ensures the database is the primary shard. We should see if we can use _shardsvrRenameCollection in $out, or we should fix $out to work with concurrent movePrimary commands. We will also need to expand our testing, since the current tests don't allow $out to be run in suites that kill the primary node and we should add movePrimary commands to the current concurrency test.
This came up in SERVER-76626 during a bug investigation with concurrent rename and shard collection commands were failing with $out writing to time-series collections.
-------------
[UPDATE - 8th of September 2023]: This is not a bug, movePrimary and the internal rename of $out are correctly serialized (here and here) through the check of isMovePrimaryInProgress flag.
- depends on
-
SERVER-77545 Wrap DB DDL lock acquisition under a Collection DDL lock acquisition
- Closed
- duplicates
-
SERVER-77545 Wrap DB DDL lock acquisition under a Collection DDL lock acquisition
- Closed
- is related to
-
SERVER-76626 Investigate test failures for concurrent $out and shardCollection commands
- Closed
- tested by
-
SERVER-78852 Test movePrimary and $out running concurrently
- Closed