-
Type: Bug
-
Resolution: Incomplete
-
Priority: Critical - P2
-
None
-
Affects Version/s: 2.6.6, 2.8.0-rc4
-
Component/s: Sharding
-
None
-
ALL
Was able to produce this with the following environment:
2 shards with 3 members (shards named "shard01" and "shard02"
2 MongoS
I then issued the following commands on MongoS#1
sh.enableSharding("test") sh.shardCollection("test.t2", {x:"hashed"}) //Three times of the following for some data for(i=0;i<1000;i++){db.t2.insert({x:i})} sh.stopBalancer()
Following this, on MongoS#2 I issued the find and change in readPref seen below in the "Actions on MongoS#2" section.
Then I issue the following on MongoS#1
sh.moveChunk("test.t2", {x:110}, "shard01")
Finally, I create a 60 second outage of the primary of "shard01" with the following command on the shell. The PID should be that of the primary of the "to" shard:
date; kill -STOP 79564; sleep 60; date; kill -CONT 79564
Once MongoS#2 detects that the primary of shard01 is down and a new primary is elected, we can issue further commands to MongoS#2 as below.
*Actions on MongoS#2"
MongoDB shell version: 2.6.5 connecting to: 127.0.0.1:27025/test //Before-Migration mongos> db.getMongo().setReadPref('primaryPreferred') mongos> db.t2.find({x:110}) { "_id" : ObjectId("5487a24f8cb198ac46491999"), "x" : 110 } { "_id" : ObjectId("5487a25c8cb198ac46491d81"), "x" : 110 } { "_id" : ObjectId("5487a25f8cb198ac46492169"), "x" : 110 } //After Migration during outage mongos> db.t2.find({x:110}) //One Second after the command above returned mongos> db.t2.find({x:110}) { "_id" : ObjectId("5487a25f8cb198ac46492169"), "x" : 110 } { "_id" : ObjectId("5487a24f8cb198ac46491999"), "x" : 110 } { "_id" : ObjectId("5487a25c8cb198ac46491d81"), "x" : 110 } mongos>
The above is repeatable on the MongoS for only the first operation after the outage has occurred. This is because we will read from the primary member before issueing the SetShardVersion (SSV).
If you switch the outage to SIGSTOP the "from" shard you can repeat the failed find a far larger number of times.
mongos> db.t2.find({x:110}) mongos> db.t2.find({x:110}) mongos> db.t2.find({x:110}) mongos> db.t2.find({x:110}) mongos> db.t2.find({x:110}) mongos> db.t2.find({x:110}) mongos> db.t2.find({x:110}) mongos> db.t2.find({x:110}) mongos> db.t2.find({x:110}) mongos> db.t2.find({x:110}) { "_id" : ObjectId("5487a25c8cb198ac46491d81"), "x" : 110 } { "_id" : ObjectId("5487a24f8cb198ac46491999"), "x" : 110 } { "_id" : ObjectId("5487a25f8cb198ac46492169"), "x" : 110 }
- related to
-
SERVER-16237 Don't check the shard version if the primary server is down
- Closed