Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-3891

crash on slave replication

    • Type: Icon: Bug Bug
    • Resolution: Cannot Reproduce
    • Priority: Icon: Blocker - P1 Blocker - P1
    • None
    • Affects Version/s: 2.0.0
    • Component/s: Replication
    • Environment:
      windows 64bit san jumbo frames 48gb ram
    • Windows

      we have a replica set of 3. 1,2,3

      1 was primary

      2 and 3 had gotten stale.

      we shutdown 2,3
      deleted db contents and started back up

      they started syncing. 3 finished fine in 15 min

      2 took longer. crashed and started over.

      then finished.

      It took couple of hours to complete.

      here is the log for 2 with the crash at Fri Sep 16 10:30:59

      Fri Sep 16 10:30:59 [conn15] command admin.$cmd command:

      { replSetHeartbeat: "prod_rudy", v: 4, pv: 1, checkEmpty: false, from: "monru01.colo.rrgroup.com:27017" }

      ntoreturn:1 reslen:125 0ms
      Fri Sep 16 10:30:59 [conn14] run command admin.$cmd

      { replSetHeartbeat: "prod_rudy", v: 4, pv: 1, checkEmpty: false, from: "monru03.colo.rrgroup.com:27017" }

      Fri Sep 16 10:30:59 [conn14] command admin.$cmd command:

      { replSetHeartbeat: "prod_rudy", v: 4, pv: 1, checkEmpty: false, from: "monru03.colo.rrgroup.com:27017" }

      ntoreturn:1 reslen:125 0ms
      Fri Sep 16 10:31:00 [websvr] User Assertion: 13142:timeout getting readlock
      Fri Sep 16 10:31:00 [websvr] Socket http response send() errno:0 The operation completed successfully. 192.168.16.35:6254
      Fri Sep 16 10:31:00 unhandled windows exception
      Fri Sep 16 10:31:00 ec=0xe06d7363
      Fri Sep 16 10:31:01 [conn15] run command admin.$cmd

      { replSetHeartbeat: "prod_rudy", v: 4, pv: 1, checkEmpty: false, from: "monru01.colo.rrgroup.com:27017" }

      Fri Sep 16 10:31:01 [conn15] command admin.$cmd command:

      { replSetHeartbeat: "prod_rudy", v: 4, pv: 1, checkEmpty: false, from: "monru01.colo.rrgroup.com:27017" }

      ntoreturn:1 reslen:125 0ms
      Fri Sep 16 10:31:01 [conn14] run command admin.$cmd

      { replSetHeartbeat: "prod_rudy", v: 4, pv: 1, checkEmpty: false, from: "monru03.colo.rrgroup.com:27017" }

      Fri Sep 16 10:31:01 [conn14] command admin.$cmd command:

      { replSetHeartbeat: "prod_rudy", v: 4, pv: 1, checkEmpty: false, from: "monru03.colo.rrgroup.com:27017" }

      ntoreturn:1 reslen:125 0ms
      Fri Sep 16 10:31:06 [initandlisten] connection accepted from 10.99.130.82:61792 #16
      Fri Sep 16 10:31:06 [conn14] run command admin.$cmd

      { replSetHeartbeat: "prod_rudy", v: 4, pv: 1, checkEmpty: false, from: "monru03.colo.rrgroup.com:27017" }

      Fri Sep 16 10:31:06 [conn15] run command admin.$cmd

      { replSetHeartbeat: "prod_rudy", v: 4, pv: 1, checkEmpty: false, from: "monru01.colo.rrgroup.com:27017" }

      Fri Sep 16 10:31:06 [conn14] command admin.$cmd command:

      { replSetHeartbeat: "prod_rudy", v: 4, pv: 1, checkEmpty: false, from: "monru03.colo.rrgroup.com:27017" }

      ntoreturn:1 reslen:125 0ms
      Fri Sep 16 10:31:06 [conn15] command admin.$cmd command: { replSetHeartb

            Assignee:
            spencer@mongodb.com Spencer Brody (Inactive)
            Reporter:
            pbrumm Pete Brumm
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

              Created:
              Updated:
              Resolved: