Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-34172

Turn primary index build ghost writes into noop oplog writes.

    • Fully Compatible
    • ALL
    • Repl 2018-04-09
    • 50

      There are two metadata writes associated with an index build, the start and the completion (or failure). Primaries naturally timestamp a successful completion as that is in the same transaction that writes an oplog entry. On secondaries, the beginning of an index build is naturally timestamped as that is associated with processing an oplog entry.

      For primaries, beginning and failing an index build need to also be timestamped. This is currently accomplished by looking at the logical clock and assigning that value as the timestamp. This has the race condition that the stable timestamp may race ahead after reading the logical clock, but before setting the timestamp on the index metadata write.

      Instead, if the write goes through the oplog as a no-op entry, the write will be timestamped without the possibility of a race (the logical clock is read and the timestamp is set under a mutex. Why this prevents the stable timestamp from racing ahead is beyond what I'd like to explain here, but am happy to in person).

      Note, it is legal for secondaries to look at the logical clock and use that to timestamp the metadata update on index completion. The index build is either in the foreground and no other operations are being processed, or the index build is in the background and acquiring a lock to perform the write prevents the replication from processing batches (via the Parallel Batch/aka peanut butter, lock).

            Assignee:
            daniel.gottlieb@mongodb.com Daniel Gottlieb (Inactive)
            Reporter:
            daniel.gottlieb@mongodb.com Daniel Gottlieb (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              Resolved: