Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-88847

Vectored inserts can violate WT stable-commit timestamp rule during FCV upgrade

    • Type: Icon: Bug Bug
    • Resolution: Duplicate
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • None
    • Replication
    • ALL
    • 147

      In the vectored insert code path,
      1) We first attempt to insert as a batch.
      2) If that fails, for example, due to a WriteConflictException (WCE), we try to insert the batch one at a time.
      3) And if that insert also fails due to a WCE, we retry that insertion using writeConflictRetry loop.

      If an FCV upgrade happens between such retry write attempt (between 1 & 2 or 2&3) , it could result in fatal error of committing a write with a timestamp older than the stable timestamp.

      [j1:s0:prim] | 2024-03-27T12:21:55.232+01:00 E  WT       22435   [S] [conn440] "WiredTiger error message","attr":{"error":22,"message":{"ts_sec":1711538515,"ts_usec":231061,"thread":"5756:140729299129264","session_name":"WT_SESSION.timestamp_transaction_uint","category":"WT_VERB_DEFAULT","category_id":12,"verbose_level":"ERROR","verbose_level_id":-3,"msg":"int __cdecl __wt_txn_validate_commit_timestamp(struct __wt_session_impl *,unsigned __int64 *):566:commit timestamp (1711538514, 10) must be after the stable timestamp (1711538514, 70)","error_str":"Invalid argument","error_code":22}}
      

      replicateVectoredInsertsTransactionally feature flag is enabled in 8.0 (SERVER-77881). Now consider the below scenario

      1) Node is in 8.0 binary + FCV 7.0, meaning the replicateVectoredInsertsTransactionally feature flag is disabled.
      2) User tries a bulk insert of 3 documents [{_id:1}, {_id:2}, {_id:3}].
      3) Initially, the server tries to write them in a batch. Since replicateVectoredInsertsTransactionally is disabled, we will allocate oplog slots and update each statement in InsertStatement vector to include oplog slot's timestamp. In this case 3 oplog slots will be allocated and InsertStatments will look like [{doc:{_id:1} , oplogSlot :TS(10)}, {doc:{_id:2} , oplogSlot :TS(20)}, {doc:{_id:3} , oplogSlot :TS(30)}].
      4) However, the batched insert fails with a WriteConflictException (WCE). Subsequently, batch will be attempted to insert one-at-a-time using the closed oplogSlot's Timestamp (Note: When step 3 fails, the associated WUOW gets aborted, causes the oplog slots (TS(10), TS(20) & TS(30)) to close).
      5) Meanwhile, the FCV upgrades to 8.0 at TS(40), meaning the replicateVectoredInsertsTransactionally feature flag is enabled.
      6) Then, stable ts advances to TS(40).
      7) Now, with replicateVectoredInsertsTransactionally enabled, oplog slots won't be reallocated. So, it will try to insert the InsertStatement <{doc:{_id:1} , oplogSlot :TS(10)}>. This violates the WT stable-commit timestamp rule as commit ts TS(10) < stable TS (40).

            Assignee:
            Unassigned Unassigned
            Reporter:
            suganthi.mani@mongodb.com Suganthi Mani
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              Resolved: