-
Type: Bug
-
Resolution: Cannot Reproduce
-
Priority: Blocker - P1
-
None
-
Affects Version/s: 2.0.0-rc1
-
Component/s: Replication
-
Environment:windows 64bit 24cpu 48gb ram san drive
-
Windows
we have a replica set of 3.
we can run repairdatabase on master 1 and it works fine
we stepdown 1
2 becomes master
when we attempt to repairdatabase on 2
it restarts
PRIMARY> db.repairDatabase();
{
"errmsg" : "exception: nextSafe():
",
"code" : 13106,
"ok" : 0
}
SECONDARY>
so it looks like it crashed and failed over to 1 again
looked at the logs and we see
Fri Sep 02 12:55:45 [conn14] warning: ClientCursor::yield can't unlock b/c of recursive lock ns: pr_blue_spruce.sessions top: { opid: 63, active: true, lockType: "write", waitingForLock: false, secs_running: 68, op: "query", ns: "pr_blue_spruce", query:
{ repairDatabase: 1.0 }, client: "127.0.0.1:53667", desc: "conn", msg: "index: (3/3) btree-middle", numYields: 0 }
Fri Sep 02 12:55:45 [conn14] warning: ClientCursor::yield can't unlock b/c of recursive lock ns: pr_blue_spruce.sessions top: { opid: 63, active: true, lockType: "write", waitingForLock: false, secs_running: 68, op: "query", ns: "pr_blue_spruce", query:
, client: "127.0.0.1:53667", desc: "conn", msg: "index: (3/3) btree-middle", numYields: 0 }
Fri Sep 02 12:55:45 [conn14] warning: ClientCursor::yield can't unlock b/c of recursive lock ns: pr_blue_spruce.sessions top: { opid: 63, active: true, lockType: "write", waitingForLock: false, secs_running: 68, op: "query", ns: "pr_blue_spruce", query:
, client: "127.0.0.1:53667", desc: "conn", msg: "index: (3/3) btree-middle", numYields: 0 }
then it keeps failing on startup and we see this exception
Fri Sep 02 12:55:57 [websvr] User Assertion: 13142:timeout getting readlock
Fri Sep 02 12:55:57 [websvr] Socket http response send() errno:0 The operation completed successfully. 192.168.16.35:36451
Fri Sep 02 12:55:57 unhandled windows exception
Fri Sep 02 12:55:57 ec=0xe06d7363
Fri Sep 02 12:55:57 [conn14] external sort used : 4 files in 11 secs
Fri Sep 02 12:55:57 [conn14] New namespace: pr_blue_spruce.sessions.$id
Fri Sep 02 12:55:57 [conn14] allocating new extent for pr_blue_spruce.sessions.$id padding:1 lenWHdr: 8192
Fri Sep 02 12:55:57 [conn14] allocating new extent for pr_blue_spruce.sessions.$id padding:1 lenWHdr: 8192
Fri Sep 02 12:55:57 [conn14] allocating new extent for pr_blue_spruce.sessions.$id padding:1 lenWHdr: 8192
Fri Sep 02 12:55:57 [conn14] allocating new extent for pr_blue_spruce.sessions.$id padding:1 lenWHdr: 8192
Fri Sep 02 12:55:57 [conn14] allocating new extent for pr_blue_spruce.sessions.$id padding:1 lenWHdr: 8192
Fri Sep 02 12:55:58 [conn14] allocating new extent for pr_blue_spruce.sessions.$id padding:1 lenWHdr: 8192
Fri Sep 02 12:55:58 [conn16] run command admin.$cmd
Fri Sep 02 12:55:58 [conn16] command admin.$cmd command:
{ replSetHeartbeat: "prod_rudy", v: 4, pv: 1, checkEmpty: false, from: "monru02.colo.rrgroup.com:27017" } ntoreturn:1 reslen:125 0ms
Fri Sep 02 12:55:58 [conn14] allocating new extent for pr_blue_spruce.sessions.$id padding:1 lenWHdr: 8192
we have tried wiping the db folder for 2 and having it resync a few times but the error doesn't go away.
- is related to
-
SERVER-3891 crash on slave replication
- Closed