Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-74793

dbCheck behaves differently on primaries and secondaries w.r.t extra _id index entries

    • Type: Icon: Bug Bug
    • Resolution: Fixed
    • Priority: Icon: Major - P3 Major - P3
    • 7.1.0-rc0, 7.0.0-rc2
    • Affects Version/s: None
    • Component/s: None
    • None
    • Storage Execution
    • Fully Compatible
    • ALL
    • v7.0, v6.3, v6.0, v5.0, v4.4
    • Execution Team 2023-05-01
    • 8

      When a collection is in an inconsistent state such that:

      • There exists an _id -> RecordId index entry
      • But the associated record store entry is missing

      A primary processing a dbCheck command will write an "extra index key" error into the health log (complete with a backtrace):

      { "_id" : ObjectId("640f5a7a51d8ea8c8a8acdd8"), "namespace" : "test.bla", "timestamp" : ISODate("2023-03-13T17:16:41.046Z"), "severity" : "error", "msg" : "Erroneous index key found with reference to non-existent record id", "scope" : "index", "operation" : "Index scan", "data" : { "recordId" : "1", "indexKeyData" : [ { "key" : { "_id" : ObjectId("640f4932ad87ee6ac9de160c") }, "pattern" : { "_id" : 1 } } ], "backtrace" : [ { ... ] } } }
      

      A primary will then abort the rest of the dbcheck (replicates a "dbCheckStop" oplog entry).

      A secondary that's processing a dbcheck oplog entry will not notice the extra _id index entry. It will* log that its dbcheck failed in the event that the record store document should exist:

      { "_id" : ObjectId("640f5de1d688051409c37488"), "namespace" : "test.bla", "timestamp" : ISODate("2023-03-13T17:31:13.619Z"), "severity" : "error", "msg" : "dbCheck batch inconsistent", "scope" : "cluster", "operation" : "dbCheckBatch", "data" : { "success" : true, "count" : NumberLong(0), "bytes" : NumberLong(0), "md5" : { "expected" : "ca673557f7697edb1dee246a460173b3", "found" : "d41d8cd98f00b204e9800998ecf8427e" }, "minKey" : { "$minKey" : 1 }, "maxKey" : { "$maxKey" : 1 }, "readTimestamp" : Timestamp(1678728673, 1), "optime" : { "ts" : Timestamp(1678728673, 2), "t" : NumberLong(4) } } }
      

      It would be better if secondaries also logged an extra _id index entry error so we could distinguish between index inconsistency (a storage problem) and data inconsistency (a replication problem).

            Assignee:
            louis.williams@mongodb.com Louis Williams
            Reporter:
            daniel.gottlieb@mongodb.com Daniel Gottlieb (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            14 Start watching this issue

              Created:
              Updated:
              Resolved: