Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-10811

Secondary thinks we are down

    • Type: Icon: Question Question
    • Resolution: Done
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: 2.4.4
    • None
    • Environment:
      Ubuntu
      Sharded replica set

      Over the last few months I've been getting this error, going through new versions didn't help. Last night I had it twice so I thought it's about time I posted something. Here is the log from the primary, during this window all querys throw an error.

      Wed Sep 18 05:16:44.428 [conn701870] command admin.$cmd command:

      { writebacklisten: ObjectId('52302fdfc47aee5088985eb0') }

      ntoreturn:1 keyUpdates:0 reslen:44 300000ms
      Wed Sep 18 05:17:21.627 [rsHealthPoll] DBClientCursor::init call() failed
      Wed Sep 18 05:17:21.685 [rsHealthPoll] replSet info db5 is down (or slow to respond):
      Wed Sep 18 05:17:21.686 [rsHealthPoll] replSet member db5 is now in state DOWN
      Wed Sep 18 05:17:22.103 [rsHealthPoll] DBClientCursor::init call() failed
      Wed Sep 18 05:17:22.103 [rsHealthPoll] replset info db9 heartbeat failed, retrying
      Wed Sep 18 05:17:23.975 [ReplicaSetMonitorWatcher] Socket recv() timeout ip:port
      Wed Sep 18 05:17:23.975 [ReplicaSetMonitorWatcher] SocketException: remote: ip:port error: 9001 socket exception [3] server [ip:port]
      Wed Sep 18 05:17:23.976 [ReplicaSetMonitorWatcher] DBClientCursor::init call() failed
      Wed Sep 18 05:17:25.193 [conn702234] command admin.$cmd command:

      { writebacklisten: ObjectId('52303e31f00d8943bc8388e0') }

      ntoreturn:1 keyUpdates:0 reslen:44 300000ms
      Wed Sep 18 05:17:27.208 [ReplicaSetMonitorWatcher] trying reconnect to db8
      Wed Sep 18 05:17:27.208 [rsHealthPoll] replset info db9 thinks that we are down
      Wed Sep 18 05:17:27.208 [rsHealthPoll] replset info db5 thinks that we are down
      Wed Sep 18 05:17:27.210 [rsHealthPoll] replSet member db5 is up
      Wed Sep 18 05:17:27.211 [rsHealthPoll] replSet member db5 is now in state SECONDARY
      Wed Sep 18 05:17:27.214 [ReplicaSetMonitorWatcher] reconnect db8 ok
      Wed Sep 18 05:17:28.051 [conn702172] command admin.$cmd command:

      { writebacklisten: ObjectId('52303e2223cd5188967ef7c5') }

      ntoreturn:1 keyUpdates:0 reslen:44 300000ms
      Wed Sep 18 05:17:29.212 [rsHealthPoll] replset info db5 thinks that we are down
      Wed Sep 18 05:17:29.212 [rsHealthPoll] replset info db9 thinks that we are down
      Wed Sep 18 05:17:29.212 [rsHealthPoll] replSet member db9 is now in state PRIMARY
      Wed Sep 18 05:17:31.213 [rsHealthPoll] replSet member db9 is now in state SECONDARY
      Wed Sep 18 05:17:31.893 [conn697777] command admin.$cmd command:

      { writebacklisten: ObjectId('522f491bbf08221ed0427b16') }

      ntoreturn:1 keyUpdates:0 reslen:44 300000ms
      Wed Sep 18 05:17:45.111 [conn619965] command admin.$cmd command:

      { writebacklisten: ObjectId('51ba0ac770d333f140193082') }

      ntoreturn:1 keyUpdates:0 reslen:44 300000ms

            Assignee:
            Unassigned Unassigned
            Reporter:
            dwayne@geekbeach.com Dwayne Bull
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: