Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-16693

It is possible to read unowned data from the primary after fail-over

    • Type: Icon: Bug Bug
    • Resolution: Incomplete
    • Priority: Icon: Critical - P2 Critical - P2
    • None
    • Affects Version/s: 2.6.6, 2.8.0-rc4
    • Component/s: Sharding
    • None
    • ALL

      Was able to produce this with the following environment:
      2 shards with 3 members (shards named "shard01" and "shard02"
      2 MongoS

      I then issued the following commands on MongoS#1

      sh.enableSharding("test")
      sh.shardCollection("test.t2", {x:"hashed"})
      //Three times of the following for some data
      for(i=0;i<1000;i++){db.t2.insert({x:i})}
      sh.stopBalancer()
      

      Following this, on MongoS#2 I issued the find and change in readPref seen below in the "Actions on MongoS#2" section.
      Then I issue the following on MongoS#1

      sh.moveChunk("test.t2", {x:110}, "shard01")
      

      Finally, I create a 60 second outage of the primary of "shard01" with the following command on the shell. The PID should be that of the primary of the "to" shard:

      date; kill -STOP 79564; sleep 60; date; kill -CONT 79564
      

      Once MongoS#2 detects that the primary of shard01 is down and a new primary is elected, we can issue further commands to MongoS#2 as below.

      *Actions on MongoS#2"

      MongoDB shell version: 2.6.5
      connecting to: 127.0.0.1:27025/test
      //Before-Migration
      mongos> db.getMongo().setReadPref('primaryPreferred')
      mongos> db.t2.find({x:110})
      { "_id" : ObjectId("5487a24f8cb198ac46491999"), "x" : 110 }
      { "_id" : ObjectId("5487a25c8cb198ac46491d81"), "x" : 110 }
      { "_id" : ObjectId("5487a25f8cb198ac46492169"), "x" : 110 }
      //After Migration during outage
      mongos> db.t2.find({x:110})
      //One Second after the command above returned
      mongos> db.t2.find({x:110})
      { "_id" : ObjectId("5487a25f8cb198ac46492169"), "x" : 110 }
      { "_id" : ObjectId("5487a24f8cb198ac46491999"), "x" : 110 }
      { "_id" : ObjectId("5487a25c8cb198ac46491d81"), "x" : 110 }
      mongos>
      

      The above is repeatable on the MongoS for only the first operation after the outage has occurred. This is because we will read from the primary member before issueing the SetShardVersion (SSV).

      If you switch the outage to SIGSTOP the "from" shard you can repeat the failed find a far larger number of times.

      mongos> db.t2.find({x:110})
      mongos> db.t2.find({x:110})
      mongos> db.t2.find({x:110})
      mongos> db.t2.find({x:110})
      mongos> db.t2.find({x:110})
      mongos> db.t2.find({x:110})
      mongos> db.t2.find({x:110})
      mongos> db.t2.find({x:110})
      mongos> db.t2.find({x:110})
      mongos> db.t2.find({x:110})
      { "_id" : ObjectId("5487a25c8cb198ac46491d81"), "x" : 110 }
      { "_id" : ObjectId("5487a24f8cb198ac46491999"), "x" : 110 }
      { "_id" : ObjectId("5487a25f8cb198ac46492169"), "x" : 110 }
      

            Assignee:
            schwerin@mongodb.com Andy Schwerin
            Reporter:
            david.hows David Hows
            Votes:
            1 Vote for this issue
            Watchers:
            13 Start watching this issue

              Created:
              Updated:
              Resolved: