Multiple documents with same _id after 3.2 upgrade

XMLWordPrintableJSON

    • Type: Bug
    • Resolution: Incomplete
    • Priority: Major - P3
    • None
    • Affects Version/s: 3.2.3
    • Component/s: WiredTiger
    • ALL
    • Hide

      Hard to be precise, but I expect this is connected with the upgrade from 3.0 on MMAPv1 to Wired Tiger on 3.2.

      Show
      Hard to be precise, but I expect this is connected with the upgrade from 3.0 on MMAPv1 to Wired Tiger on 3.2.
    • None
    • 3
    • None
    • None
    • None
    • None
    • None
    • None

      We upgraded a replica set from 2.6 to 3.0 then 3.2, each time adding new servers, letting them replicate then removing the old servers.

      On the day that the first 3.2 server was added as a 4th (hidden) node in the 3.0 cluster to start the data sync, we appear to have encountered some data corruption.

      To be precise, we now have examples of multiple documents in a collection with the same _id. An unbounded find on the collection shows them (only showing the record around the problem one for brevity, and anonymising the collection name):

      db['COLLECTION'].find()
      { "_id" : "2016-03-15", "percent" : 7.317073170731707 }
      { "_id" : "2016-03-16", "percent" : 7.4074074074074066 }
      { "_id" : "2016-03-17", "percent" : 6.666666666666667 }
      { "_id" : "2016-03-18", "percent" : 6.944444444444445 }
      { "_id" : "2016-03-18", "percent" : 7.792207792207792 }
      { "_id" : "2016-03-19", "percent" : 7.6923076923076925 }
      { "_id" : "2016-03-20", "percent" : 7.6923076923076925 }
      { "_id" : "2016-03-21", "percent" : 6.756756756756757 }
      { "_id" : "2016-03-22", "percent" : 6.944444444444445 }
      { "_id" : "2016-03-23", "percent" : 7.142857142857142 }
      

      Note that "_id" : "2016-03-18" is there twice.

      If I try and query directly for this record, only one appears:

      db['COLLECTION'].find({ "_id" : "2016-03-18" })
      { "_id" : "2016-03-18", "percent" : 7.792207792207792 }
      
      db['COLLECTION'].find({ "_id" : {$gt: "2016-03-17", $lt: "2016-03-19"} })
      { "_id" : "2016-03-18", "percent" : 7.792207792207792 }
      

      Would a copy of the WiredTiger datafiles for this collection and its indexes help with analysing this issue?

            Assignee:
            Eric Milkie
            Reporter:
            Greg Murphy
            Votes:
            1 Vote for this issue
            Watchers:
            22 Start watching this issue

              Created:
              Updated:
              Resolved: