Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-85275

Resharding oplog application should ignore DuplicateKey error

    • Cluster Scalability
    • ALL
    • Cluster Scalability 2024-07-08, Cluster Scalability 2024-07-22, Cluster Scalability 2024-08-19

      Resharding like replication applies oplog entries in batches using multiple parallel threads. Oplog entries that touch the same document are batched together and applied in the same thread. So oplog application in resharding (and replication) preserves the (timestamp, _id) order; however, it doesn't preserve the overall write order. Consider a collection with a unique index {a: 1}, we insert the document {_id: 1, a: "foo"} and then delete {_id: 1, a: "foo"} and then insert {_id: 2, a: "foo"}. Resharding would apply the oplog entries in two threads:

      • Thread 1: insert {_id: 1, a: "foo"}, delete {_id: 1, a: "foo"}
      • Thread 2: insert {_id: 2 a: "foo"}

      So if Thread 2 runs completely before Thread 1 if Thread 2 interleaves with Thread 1, then oplog application would end up with a DuplicateKey error. It should just ignore this DuplicateKey error just like what replication oplog application does today.

            Assignee:
            Unassigned Unassigned
            Reporter:
            cheahuychou.mao@mongodb.com Cheahuychou Mao
            Votes:
            0 Vote for this issue
            Watchers:
            16 Start watching this issue

              Created:
              Updated: