Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-20634

Mongo 3.0.5 might lose update operation on record if update operations occur during shard data migration

    • Type: Icon: Bug Bug
    • Resolution: Duplicate
    • Priority: Icon: Critical - P2 Critical - P2
    • None
    • Affects Version/s: 3.0.5
    • Component/s: Sharding
    • Environment:
      A mongos manages two shards, version 3.0.5. Replica set consist of 3 mongod instances, one of them is arbiter. Both mongod instances started with --storageEngine wiredTiger parameter.
    • Linux
    • Hide

      Use below command to update record with about 50000 requests per second:
      {findAndModify:"im_latest_id", query:

      {jid:"xxxxxxxxxxxx"}

      , update: {$inc:

      {latest:2}

      , $set:{latest_modified:new Date()}}, new: true }
      if the record of owner isn't existed, insert it.
      im_latest_id collection's shard key is ‘jid’ field with hash index, and there is a unique index on ‘jid’ field also.

      Show
      Use below command to update record with about 50000 requests per second: {findAndModify:"im_latest_id", query: {jid:"xxxxxxxxxxxx"} , update: {$inc: {latest:2} , $set:{latest_modified:new Date()}}, new: true } if the record of owner isn't existed, insert it. im_latest_id collection's shard key is ‘jid’ field with hash index, and there is a unique index on ‘jid’ field also.

      Hi, guys
      We found that Mongo 3.0.5 might lose update operation on record if update operations occur during shard data migration.
      Environment:
      A mongos manages two shards, version 3.0.5. Replica set consist of 3 mongod instances, one of them is arbiter. Both mongod instances started with --storageEngine wiredTiger parameter.

      We’re using a mongos instance that manages 2 shards with Mongo version 2.4.3 as our DB server. There is a collection ‘latest_id’ in db ‘msg_id’ on mongos, shard key is ‘jid’ field with hash index, and there is a unique index on ‘jid’ field also.
      We use bellow command to make an increased counter:

      {findAndModify:"im_latest_id", query: {jid:"6019806037"}, update: {$inc:{latest:2}, $set:{latest_modified:new Date()}}, new: true }
      

      It works fine most time, however, we have encountered an issue during pressure test: A successful update operation on “latest” field has lost!
      From our debug log, we got bellow update operation logs:

      2015-09-21 14:35:40 [96] [INFO] 139687556163328: getLastId: updated record for owner [6019806037], fAndM command returned value: [{ value: { _id: ObjectId('55ffa53ce64830c63e9297ff'), jid: "6019806037", latest: 2, latest_modified: new Date(1442817340484) }, lastErrorObject: { updatedExisting: true, n: 1 }, ok: 1.0, $gleStats: { lastOpTime: Timestamp 1442817340|400, electionId: ObjectId('55ff7e55621c89462342f3ea') } }]
      2015-09-21 14:36:20 [96] [INFO] 140236372915968: getLastId: updated record for owner [6019806037], fAndM command returned value: [{ value: { _id: ObjectId('55ffa53ce64830c63e9297ff'), jid: "6019806037", latest: 4, latest_modified: new Date(1442817380253) }, lastErrorObject: { updatedExisting: true, n: 1 }, ok: 1.0, $gleStats: { lastOpTime: Timestamp 1442817380|631, electionId: ObjectId('55ff7e55621c89462342f3ea') } }]
      

      Above two lines show that two findAndModify operation were successful and new data have been returned, however, from mongo shell, the data is below:

      mongos> db.im_latest_id.find({jid: "6019806037"})
      { "_id" : ObjectId("55ffa53ce64830c63e9297ff"), "jid" : "6019806037", "latest" : 2, "latest_modified" : ISODate("2015-09-21T06:35:40.484Z") }
      

      Therefore, I also checked the data change history from local db in shards:
      /usr/local/mongodb30/bin/mongo localhost:6311/local

      rep9:PRIMARY> db.oplog.rs.find({$or:[{"o._id":ObjectId('55ffa53ce64830c63e9297ff')}, {"o2._id":ObjectId('55ffa53ce64830c63e9297ff')}]})
      { "ts" : Timestamp(1442817354, 1379), "h" : NumberLong("1160217985212942599"), "v" : 2, "op" : "i", "ns" : "msg_imid.im_latest_id", "fromMigrate" : true, "o" : { "_id" : ObjectId("55ffa53ce64830c63e9297ff"), "jid" : "6019806037", "latest" : 2, "latest_modified" : ISODate("2015-09-21T06:35:40.484Z") } }
      

      /usr/local/mongodb30/bin/mongo localhost:6316/local

      rep19:PRIMARY> db.oplog.rs.find({$or:[{"o._id":ObjectId('55ffa53ce64830c63e9297ff')}, {"o2._id":ObjectId('55ffa53ce64830c63e9297ff')}]})
      { "ts" : Timestamp(1442817340, 398), "h" : NumberLong("8374787035435596051"), "v" : 2, "op" : "i", "ns" : "msg_imid.im_latest_id", "o" : { "_id" : ObjectId("55ffa53ce64830c63e9297ff"), "jid" : "6019806037", "latest" : 0, "latest_modified" : ISODate("2015-09-21T06:35:40.484Z") } }
      { "ts" : Timestamp(1442817340, 400), "h" : NumberLong("-7736351314285439896"), "v" : 2, "op" : "u", "ns" : "msg_imid.im_latest_id", "o2" : { "_id" : ObjectId("55ffa53ce64830c63e9297ff") }, "o" : { "$set" : { "latest" : 2 } } }
      { "ts" : Timestamp(1442817380, 631), "h" : NumberLong("6375165799258425064"), "v" : 2, "op" : "u", "ns" : "msg_imid.im_latest_id", "o2" : { "_id" : ObjectId("55ffa53ce64830c63e9297ff") }, "o" : { "$set" : { "latest" : 4, "latest_modified" : ISODate("2015-09-21T06:36:20.253Z") } } }
      { "ts" : Timestamp(1442817391, 1797), "h" : NumberLong("-9215893867258519166"), "v" : 2, "op" : "d", "ns" : "msg_imid.im_latest_id", "fromMigrate" : true, "o" : { "_id" : ObjectId("55ffa53ce64830c63e9297ff") } }
      

      From above change history, we can see that the record has been copied from replica set “rep19” to “rep9” at Timestamp(1442817354, 1379); then a new update operation still has occurred on replica set “rep19” at Timestamp(1442817380, 631); finally, replica set “rep19” performed a delete operation at Timestamp(1442817391, 1797) to delete the updated record.

            Assignee:
            Unassigned Unassigned
            Reporter:
            lucifinil Lucifinil Long
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

              Created:
              Updated:
              Resolved: