Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Incomplete
Priority: Critical - P2
Fix Version/s: None
Affects Version/s: 2.6.6, 2.8.0-rc4
Component/s: Sharding
Labels:
None

Operating System:
ALL
Confidence Status:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name:
None
Goal Link:
None

Was able to produce this with the following environment:
2 shards with 3 members (shards named "shard01" and "shard02"
2 MongoS

I then issued the following commands on MongoS#1

sh.enableSharding("test")
sh.shardCollection("test.t2", {x:"hashed"})
//Three times of the following for some data
for(i=0;i<1000;i++){db.t2.insert({x:i})}
sh.stopBalancer()

Following this, on MongoS#2 I issued the find and change in readPref seen below in the "Actions on MongoS#2" section.
Then I issue the following on MongoS#1

sh.moveChunk("test.t2", {x:110}, "shard01")

Finally, I create a 60 second outage of the primary of "shard01" with the following command on the shell. The PID should be that of the primary of the "to" shard:

date; kill -STOP 79564; sleep 60; date; kill -CONT 79564

Once MongoS#2 detects that the primary of shard01 is down and a new primary is elected, we can issue further commands to MongoS#2 as below.

*Actions on MongoS#2"

MongoDB shell version: 2.6.5
connecting to: 127.0.0.1:27025/test
//Before-Migration
mongos> db.getMongo().setReadPref('primaryPreferred')
mongos> db.t2.find({x:110})
{ "_id" : ObjectId("5487a24f8cb198ac46491999"), "x" : 110 }
{ "_id" : ObjectId("5487a25c8cb198ac46491d81"), "x" : 110 }
{ "_id" : ObjectId("5487a25f8cb198ac46492169"), "x" : 110 }
//After Migration during outage
mongos> db.t2.find({x:110})
//One Second after the command above returned
mongos> db.t2.find({x:110})
{ "_id" : ObjectId("5487a25f8cb198ac46492169"), "x" : 110 }
{ "_id" : ObjectId("5487a24f8cb198ac46491999"), "x" : 110 }
{ "_id" : ObjectId("5487a25c8cb198ac46491d81"), "x" : 110 }
mongos>

The above is repeatable on the MongoS for only the first operation after the outage has occurred. This is because we will read from the primary member before issueing the SetShardVersion (SSV).

If you switch the outage to SIGSTOP the "from" shard you can repeat the failed find a far larger number of times.

mongos> db.t2.find({x:110})
mongos> db.t2.find({x:110})
mongos> db.t2.find({x:110})
mongos> db.t2.find({x:110})
mongos> db.t2.find({x:110})
mongos> db.t2.find({x:110})
mongos> db.t2.find({x:110})
mongos> db.t2.find({x:110})
mongos> db.t2.find({x:110})
mongos> db.t2.find({x:110})
{ "_id" : ObjectId("5487a25c8cb198ac46491d81"), "x" : 110 }
{ "_id" : ObjectId("5487a24f8cb198ac46491999"), "x" : 110 }
{ "_id" : ObjectId("5487a25f8cb198ac46492169"), "x" : 110 }

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

mongos.log
Jan 05 2015 12:25:21 AM UTC
55 kB
David Hows

related to

SERVER-16237 Don't check the shard version if the primary server is down

Closed

Assignee:: Andy Schwerin

Reporter:: David Hows (Inactive)

Participants:: Andy Schwerin, David Hows, Ramon Fernandez Marina, Scott Hernandez

Votes:: 1 Vote for this issue

Watchers:: 13 Start watching this issue

Created:: Dec 30 2014 07:21:49 AM UTC

Updated:: Mar 12 2016 05:37:17 PM UTC

Resolved:: Mar 12 2016 05:37:17 PM UTC

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates