-
Type: Bug
-
Resolution: Done
-
Priority: Major - P3
-
None
-
Affects Version/s: 2.2.0-rc2
-
Component/s: Sharding
-
None
-
Environment:Ubuntu 10.04 64bit
-
Linux
I am running a shard with two members, 3 configservers and one mongos. I am currently in a state where on mongos sh.status() always fails. The error message is:
on Aug 27 13:58:35 decode failed. probably invalid utf-8 string [???]
Mon Aug 27 13:58:35 why: TypeError: malformed UTF-8 character sequence at offset 0
TypeError: malformed UTF-8 character sequence at offset 0
The same error also is printed when I use the config database an issue db.databases.find({}); I can see 3 databases listed and the error is then displayed. db.databases.find({}).count() shows that there are 5 records there. When connected to the shard members, there are 4 databases on each (and they match by name at least).
Now I got in this state during some heavy modifications to the sharding setup, which might be relevant:
I ran db.runCommand(
{removeshard: "shard0000"}) to remove a shard running very low on disk space. Unfortunately the free space ran out and mongod process froze/crashed and had to be restarted (this is actually because of the post-cleanup.XXXX.bson files created, I now bravely delete them by hand). So, the draining process then hangs, log displays:
Mon Aug 27 10:57:16 [conn4] about to log metadata event: { _id: "mongoimg-2012-08-27T07:57:16-19", server: "mongoimg", clientAddr: "192.168.100.40:36263", time: new Date(1346054236814), what: "moveChunk.from", ns: "project.fs.chunks", details: { min:
{ files_id: ObjectId('4f8323e4af8cd13414001317') }, max:
{ files_id: ObjectId('4f8414daae8cd11f7200004a') }, step1 of 6: 0, note: "aborted" } }
each 6 seconds. To solve this, I set the shards draining status to false and restarted configservers (more than once) then enabled draining, by issuing db.runCommand(
{removeshard: "shard0000"}) again and restarted configservers one by one again (this was repeated several times because one of the configservers was moved to a different IP and the repeating logmessage wouldn't go away).
So, currently the shard seems to be draining normally, but sh.status() always fails.
Nothing interesting in mongos.log, the string "UTF8" doesn't appear. The only thing that I see that might be related to this is:
Mon Aug 27 11:45:03 [Balancer] moveChunk result: { who: { _id: "project.fs.chunks", process: "mongoimg:27017:1345997938:164881649", state: 2, ts: ObjectId('503b20b8706f005dbbff04b8'), when: new Date(1346052280689), who: "mongoimg:27017:1345997938:164881649:conn28:1347786322", why: "migrate-
{ files_id: ObjectId('4f8323e4af8cd13414001317') }" }, errmsg: "the collection metadata could not be locked with lock migrate-
{ files_id: MinKey }", ok: 0.0 }