Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-31687

Missing minvalid in sync source cause oplog full table scan all the time

    • Type: Icon: Bug Bug
    • Resolution: Duplicate
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: 3.2.17, 3.4.9
    • Component/s: Replication
    • None
    • ALL
    • v4.0, v3.6, v3.4
    • Hide
          OplogReader syncSourceReader;
          OpTime minValid;
          if (_replCoord->getMemberState().recovering()) {
              BatchBoundaries boundaries = getMinValid(txn);
              if (boundaries.end > lastOpTimeFetched) {
                  minValid = boundaries.end;
              }
          }
          syncSourceReader.connectToSyncSource(txn, lastOpTimeFetched, minValid, _replCoord);
      

      connectToSyncSource wil use db.oplog.rs.find( ts: {$gte: minValid, $lte: minvalid}) to check this oplog in sync source, which will trigger oplogStartHack.

      When the minvalid optime exists in sync source, the oplogStartHack work very well; but if it doesn't exists in sync source, it will cause oplog full table scan.

      I think when the minvaid doesn't exist in sync source, RecordStore::oplogStartHack should return RecordId::max() to avoid full oplog table scan, which is a very heavy operation.

      diff --git a/src/mongo/db/storage/wiredtiger/wiredtiger_record_store.cpp b/src/mongo/db/storage/wiredtiger/wiredtiger_record_store.cpp
      index f6fa6d6..94a1d48 100644
      --- a/src/mongo/db/storage/wiredtiger/wiredtiger_record_store.cpp
      +++ b/src/mongo/db/storage/wiredtiger/wiredtiger_record_store.cpp
      @@ -1792,7 +1792,7 @@ boost::optional<RecordId> WiredTigerRecordStore::oplogStartHack(
           if (ret == 0 && cmp > 0)
               ret = c->prev(c);  // landed one higher than startingPosition
           if (ret == WT_NOTFOUND)
      -        return RecordId();  // nothing <= startingPosition
      +        return RecordId::max();  // will not cause oplog full scan
           invariantWTOK(ret);
      
           int64_t key;
      
      Show
      OplogReader syncSourceReader; OpTime minValid; if (_replCoord->getMemberState().recovering()) { BatchBoundaries boundaries = getMinValid(txn); if (boundaries.end > lastOpTimeFetched) { minValid = boundaries.end; } } syncSourceReader.connectToSyncSource(txn, lastOpTimeFetched, minValid, _replCoord); connectToSyncSource wil use db.oplog.rs.find( ts: {$gte: minValid, $lte: minvalid}) to check this oplog in sync source, which will trigger oplogStartHack. When the minvalid optime exists in sync source, the oplogStartHack work very well; but if it doesn't exists in sync source, it will cause oplog full table scan. I think when the minvaid doesn't exist in sync source, RecordStore::oplogStartHack should return RecordId::max() to avoid full oplog table scan, which is a very heavy operation. diff --git a/src/mongo/db/storage/wiredtiger/wiredtiger_record_store.cpp b/src/mongo/db/storage/wiredtiger/wiredtiger_record_store.cpp index f6fa6d6..94a1d48 100644 --- a/src/mongo/db/storage/wiredtiger/wiredtiger_record_store.cpp +++ b/src/mongo/db/storage/wiredtiger/wiredtiger_record_store.cpp @@ -1792,7 +1792,7 @@ boost::optional<RecordId> WiredTigerRecordStore::oplogStartHack( if (ret == 0 && cmp > 0) ret = c->prev(c); // landed one higher than startingPosition if (ret == WT_NOTFOUND) - return RecordId(); // nothing <= startingPosition + return RecordId::max(); // will not cause oplog full scan invariantWTOK(ret); int64_t key;
    • Repl 2018-10-08

      The mongod instance in my production environtment suffers very big IO pressure, and I see a lot of oplog scan in the log like belows.

      2017-10-23T17:48:01.845+0800 I COMMAND  [conn8766752] query local.oplog.rs query: { ts: { $gte: Timestamp 1505624058000|95, $lte: Timestamp 1505624058000|95 } } planSummary: COLLSCAN cursorid:20808023597 ntoreturn:0 ntoskip:0 keysExamined:0 docsExamined:44669401 keyUpdates:0 writeConflicts:0 numYields:353599 nreturned:0 reslen:20 locks:{ Global: { acquireCount: { r: 707200 } }, Database: { acquireount: { r: 353600 }, acquireWaitCount: { r: 15 }, timeAcquiringMicros: { r: 3667 } }, oplog: { acquireCount: { r: 353600 } } } 935646ms
      

      I found one of the secondary( a hidden node) is in RECORING state, I dive into the code and find the root cause.

      The hidden node has following warning log

      We cannot use xxx as a sync source because it does not contain the necessary operations for us to reach a consistent state ...
      

      When secondary choose a sync source ,it will make sure the oplog in minvalid exist in the sync source,it will send a

            Assignee:
            tess.avitabile@mongodb.com Tess Avitabile (Inactive)
            Reporter:
            zyd_com@126.com Zhang Youdong
            Votes:
            1 Vote for this issue
            Watchers:
            11 Start watching this issue

              Created:
              Updated:
              Resolved: