This issue is included in MongoDB System Alert: Sharded multi-document transactions may perform operations using inconsistent sharding metadata. The information below describes only the behavior and impact related to SERVER-84723. Please start with the consolidated issue page for guidance on identifying if these issues impact you.
SUMMARY
Operations within a multi-document transaction may return incomplete data and may not apply update or delete operations to documents if they occur on ranges of data affected by a concurrent sharding metadata change.
ISSUE DESCRIPTION AND IMPACT
A multi-document transaction may see partial effects of a Data Definition Layer (DDL) operation running concurrently on the involved collections. This can manifest as the transaction seeing only part of the involved collection, or observing a mix of the collection's state before and after modifying sharding metadata. This will cause reads within the transaction on sharded collections to miss data or return a mix of data from different metadata versions. In turn, writes within the transaction on sharded collections can possibly miss updating/deleting documents.
This issue affects MongoDB versions 7.0.0 through 7.0.5.
The minimum conditions for the issue to manifest (all must be met) are:
- Sharded cluster with more than one shard
- An application which uses or used:
- A multi-statement transaction that:
- Runs at local, majority, or snapshot read concern.
- Involves more than one collection, at least one of which is sharded.
- Involves more than one shard
- Queryable Encryption
- Where one or more sharded collections have encrypted fields
- A multi-statement transaction that:
- Concurrent DDL operations:
- renameCollection()
- drop()
- reshardCollection()
The table below describes which types of operations may be impacted and how:
What is affected | Effect | Downstream Effect |
---|---|---|
Reads or Writes outside of a transaction | None | None |
Within a transaction - Reads or Writes to a collection not undergoing drop or rename | None | None |
Within a transaction - Reads or Writes to an unsharded collection. | None | None |
Within a transaction using snapshot read concern - Writes to a sharded collection undergoing a drop or rename | None | None |
Within a transaction using local or majority read concern - Writes to sharded collections undergoing drop or rename | Updates or deletes may miss documents which should be targeted. Inserts will raise a WriteConflict, or will be applied correctly, depending on the exact interleaving and targeted shard. Writes on newly inserted documents will be correctly applied. |
Application level inconsistencies between documents. No replica set inconsistencies or index inconsistencies. |
Within a transaction using any read concern - Reads from sharded collections undergoing drop or rename | Possible incomplete results. | Application-introduced inconsistencies if reads would prompt additional action. |
WORKAROUND
If your workload utilizes multi-document transactions on a Sharded cluster meeting the criteria above, we recommend that you:
- Upgrade to MongoDB 7.0.6 or later
- See the Diagnosis & Remediation section below
DIAGNOSIS & REMEDIATION
See MongoDB System Alert: Sharded multi-document transactions may perform operations using inconsistent sharding metadata for guidance on assessing if you are impacted and the recommended remediation steps.
Original description
Consider the following interleaving (repro1.js):
1. Initial state:
- collA: sharded collection with chunks both on shard0 and shard1.
- collB: unsharded collection on shard0.
- collC: does not exist.
2. Start txn with local or majority read concern, hit shard0 to read collB [shard0's txn snapshot has: ns1 and ns2]
3. Rename collA -> collC.
4. Read collC. On shard0, collC does not exist in the txn snapshot. On shard1 it will. Therefore the txn will see half the collection.
Moreover, if collectionC existed initially, the transaction would observe a mix of the original collection and the post-rename collection.
The example above involves rename, but a similar situation might be possible with reshardCollection.
Another anomaly is (repro2.js):
1. Initial state
- shard0 (dbPrimary): collA(sharded) and collB(unsharded)
- shard1: collA(sharded)
2. Start txn (local, majority or snapshot read concern), hit shard0 for collB
3. Drop collA
4. Read collA. Will target shard0, will read the sharded coll (but just half of it).
- causes
-
SERVER-95330 Validation checks to local catalog for unsharded collections are too strict
- Closed
- is related to
-
SERVER-87061 Sharded multi-document transactions can observe partial effects of concurrent reshard operation
- Closed
-
SERVER-84760 Violation of transaction snapshot isolation when collection concurrently dropped
- Closed
-
SERVER-86583 Non-transactional snapshot read on unsharded collection may execute with mismatched sharding metadata
- Closed
- related to
-
SERVER-77506 Sharded multi-document transactions can mismatch data and ShardVersion
- Closed
-
SERVER-88746 [v7.0] Writes in transactions in replica sets may not conflict with collection drop and rename, violating snapshot isolation
- Closed