-
Type: Task
-
Resolution: Won't Do
-
Priority: Major - P3
-
None
-
Affects Version/s: None
-
Component/s: None
-
None
-
Sharding NYC
For prepared internal transactions for retryable findAndModify, the pre/post image is written to the image collection at prepare time. On the primary the write is done in a side storage engine transaction, whereas on secondaries the write is done in the prepared transaction's storage engine transaction. This has caused the primary and secondaries to have inconsistent behaviors:
- On nodes that are secondaries when the transaction enters prepare, the config.image_collection IX lock is held along with other locks acquired for the transaction until the transaction commits or aborts. So if there is failover, step up can hang (to be solved in
SERVER-63071). - If the transaction aborts after prepare, the image collection on the primary is expected to be inconsistent with the image collection on secondaries. The reason is that when the transaction aborts, the write to image collection only gets rolled back on secondaries.
To solve this, there are two options:
- Make secondaries also write to the image collection in a side storage engine transaction. One challenge here is to determine what timestamp the storage engine transaction should use.
- Make primary write the image collection in the transaction’s storage transaction. This would require flipping the order in TransactionParticipant to write the applyOps oplog entries before putting the transaction’s storage transaction into prepare. It is unclear if this would be safe.
- is related to
-
SERVER-63071 [Retryability] Prepared internal transactions for retryable findAndModify can cause stepup to hang
- Closed
-
SERVER-62785 Write change stream pre-images in the main storage engine transaction before the transaction reaches Prepared state
- Closed
- related to
-
SERVER-63633 Remove TODO listed in SERVER-63258
- Closed