-
Type: Bug
-
Resolution: Fixed
-
Priority: Major - P3
-
Affects Version/s: None
-
Component/s: Sharding
-
Cluster Scalability
-
Fully Compatible
-
ALL
-
v8.0
-
Cluster Scalability 2024-1-22, Cluster Scalability 2024-2-5, Cluster Scalability 2024-2-19, Cluster Scalability 2024-3-4, Cluster Scalability 2024-3-18, Cluster Scalability 2024-4-1, Cluster Scalability 2024-4-15, Cluster Scalability 2024-4-29
An upsert that targets a different shard than will own the inserted document triggers the WouldChangeOwningShard protocol, which spawns a transaction that in this case would do a noop update on the initial shard and an insert on the owning shard. This will trigger the single write shard optimization (a noop update is considered a read), so the read shard will not have a durable transaction record and may "forget" the transaction. If the client retries their update and an intervening writer inserted a matching document on the read shard, the retry will update that matching document, instead of triggering the WCOS protocol, breaking the semantics of a retryable write.
Updates that change a shard key value have never been fully retryable by design, but before the single write shard optimization was enabled in SERVER-48340, they required two phase commit, which by implementation happens to write a transaction record on the "read" shard, which guarantees it throws an error on a retry even if there is an intervening write. When full retryability is added for such updates, this problem should go away, but in the meantime, to maximize safety we should disable the single write shard optimization for writes that trigger the WCOS protocol.
- related to
-
SERVER-89032 TODO: Investigate WCOS error handling path for time series updates
- Backlog
-
SERVER-48340 Re-enable single-write-shard transaction commit optimization
- Closed