Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-17506

Race between inserts and checkpoints can lose records under WiredTiger

    • Type: Icon: Bug Bug
    • Resolution: Done
    • Priority: Icon: Critical - P2 Critical - P2
    • 3.0.1, 3.1.0
    • Affects Version/s: 3.0.0
    • Component/s: WiredTiger
    • Fully Compatible
    • ALL

      Repro:

      collections=1000
      threads=1
      
      function checkpoint {
          echo checkpointing
          mongo --quiet --eval "printjson(db.adminCommand({fsync:1}))"
      }
      
      function setup {
          killall -w mongod
          rm -rf db; mkdir -p db
          mongod --dbpath db --logpath db.log --fork --storageEngine wiredTiger
          sleep 2
          echo initializing $collections collections
          mongo --quiet --eval "
              for (var i=0; i<$collections; i++)
                  db['c'+i].insert({})
          "
          checkpoint
      }
      
      function repro {
          (
              for t in $(seq 0 $[$threads-1]); do
                  echo inserting $t
                  mongo --quiet --eval "
                      for (var i=$t; i<$collections; i+=$threads)
                          db['c'+i].insert({})
                  " &
              done
              wait
              echo finished inserts
          ) &
          checkpoint &
          wait
          echo finished parallel checkpoint
          checkpoint
          echo finished sequential checkpoint        
      }
      
      function check {
          echo checking
          mongo --quiet --eval "
              for (var i=0; i<$collections; i++) {
                  try {
                      c = db['c'+i]
                      n_id = c.find().hint({_id:1}).itcount()
                      n_c = c.find().hint({\$natural:1}).itcount()
                      if (n_id != n_c)
                          print(i, n_c, n_id, 'OOPS')
                  } catch (e) {
                      print(i, e)
                  }
              }
          "
      }
      
      setup
      while true; do
          repro
          echo stopping
          killall -w mongod
          echo starting
          mongod --dbpath db --logpath db.log --logappend --fork --storageEngine wiredTiger
          sleep 2
          check
      done
      

      Produces following errors from app on second time through the loop:

      27 Error: error: { "$err" : "Didn't find RecordId in WiredTigerRecordStore", "code" : 28556 }
      28 Error: error: { "$err" : "Didn't find RecordId in WiredTigerRecordStore", "code" : 28556 }
      29 Error: error: { "$err" : "Didn't find RecordId in WiredTigerRecordStore", "code" : 28556 }
      30 Error: error: { "$err" : "Didn't find RecordId in WiredTigerRecordStore", "code" : 28556 }
      31 Error: error: { "$err" : "Didn't find RecordId in WiredTigerRecordStore", "code" : 28556 }
      32 Error: error: { "$err" : "Didn't find RecordId in WiredTigerRecordStore", "code" : 28556 }
      33 Error: error: { "$err" : "Didn't find RecordId in WiredTigerRecordStore", "code" : 28556 }
      ...
      

      And the following in the mongod log:

      2015-03-07T12:53:31.671-0500 E STORAGE  WiredTiger (0) [1425750811:671913][9925:0x7f3a93725700], eviction-server: cache server: exiting with 420 pages in memory and 416 pages evicted
      2015-03-07T12:53:31.671-0500 E STORAGE  WiredTiger (0) [1425750811:671962][9925:0x7f3a93725700], eviction-server: cache server: exiting with 1640 bytes in memory
      2015-03-07T12:53:31.671-0500 E STORAGE  WiredTiger (0) [1425750811:671976][9925:0x7f3a93725700], eviction-server: cache server: exiting with 1058 bytes dirty and 2 pages dirty
      
      2015-03-07T12:53:35.799-0500 I -        [conn1] Assertion: 28556:Didn't find RecordId in WiredTigerRecordStore
      

            Assignee:
            michael.cahill@mongodb.com Michael Cahill (Inactive)
            Reporter:
            bruce.lucas@mongodb.com Bruce Lucas (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            18 Start watching this issue

              Created:
              Updated:
              Resolved: