-
Type: Bug
-
Resolution: Incomplete
-
Priority: Critical - P2
-
None
-
Affects Version/s: 2.0.7
-
Component/s: None
-
None
-
Linux
I started draining the last of four shards in a live sharded mongo cluster (v2.0.7), with each shard being a 3-node replset, and it went fine until it got to 16 chunks remaining. Now the draining has been stuck there for more than four hours.
mongos> db.runCommand(
{removeShard:"mongo-live-d"})
{
"msg" : "draining ongoing",
"state" : "ongoing",
"remaining" :
,
"ok" : 1
}
The mongos log shows this:
Wed Nov 21 22:10:26 [Balancer] distributed lock 'balancer/mongo-live-a-1:27017:1350073653:1804289383' acquired, ts : 50adc1d2538fcedc6aa3cf93
Wed Nov 21 22:10:26 [Balancer] biggest shard mongo-live-b has unprocessed writebacks, waiting for completion of migrate
Wed Nov 21 22:10:26 [Balancer] biggest shard mongo-live-b has unprocessed writebacks, waiting for completion of migrate
Wed Nov 21 22:10:26 [Balancer] biggest shard mongo-live-b has unprocessed writebacks, waiting for completion of migrate
Wed Nov 21 22:10:26 [Balancer] distributed lock 'balancer/mongo-live-a-1:27017:1350073653:1804289383' unlocked.
When I check writebacksQueued the total ops never goes down but is increasing over time:
PRIMARY> db.adminCommand("writeBacksQueued")
{
"hasOpsQueued" : true,
"totalOpsQueued" : 603910,
"queues" : { "50787cba376f032868ac165e" :
,
"50787cba4a4a812e093429a5" :
,
"50787cba5df1e05fedab56ff" :
,
"50787cbadc8a4a2ee5bab98f" :
,
"50787cbafb83be34cb49a885" :
},
"ok" : 1
}
The "totalOpsQueued" and various "n" values keep going up. I don't see anything interesting in the troublesome shard's mongod log. I'd try restarting everything but I'm worried that this queued data would be lost.