-
Type: Bug
-
Resolution: Duplicate
-
Priority: Major - P3
-
None
-
Affects Version/s: 3.0.0
-
Component/s: Replication
-
None
-
Fully Compatible
-
Windows
-
Security 1 04/03/15
We are encountering unexpected elections when running the .NET driver unit tests against a replica set (with auth and ssl if that makes a difference).
Apparently running the dropDatabase command can cause an unexpected election to occur.
From the attached logs:
On member 27017:
2015-04-01T16:22:18.013+0000 D REPL [ReplicationExecutor] localhost:27018 thinks that we are down because they cannot send us heartbeats. 2015-04-01T16:22:18.013+0000 I REPL [ReplicationExecutor] Member localhost:27018 is now in state PRIMARY 2015-04-01T16:22:18.013+0000 I REPL [ReplicationExecutor] replset: another primary seen with election time 551c1b39:1 my election time is 551c1ab4:1 2015-04-01T16:22:18.013+0000 I REPL [ReplicationExecutor] stepping down; another primary was elected more recently 2015-04-01T16:22:18.013+0000 D REPL [ReplicationExecutor] Scheduling heartbeat to localhost:27018 at 2015-04-01T16:22:20.013Z 2015-04-01T16:22:18.013+0000 I REPL [ReplicationExecutor] Stepping down from primary in response to heartbeat
And on member 27018:
2015-04-01T16:22:16.553+0000 I NETWORK [ReplExecNetThread-0] Socket recv() timeout 127.0.0.1:27017 2015-04-01T16:22:16.553+0000 I NETWORK [ReplExecNetThread-0] SocketException: remote: 127.0.0.1:27017 error: 9001 socket exception [RECV_TIMEOUT] server [127.0.0.1:27017] 2015-04-01T16:22:16.553+0000 I NETWORK [ReplExecNetThread-0] DBClientCursor::init call() failed 2015-04-01T16:22:16.553+0000 D - [ReplExecNetThread-0] User Assertion: 10276:DBClientBase::findN: transport error: localhost:27017 ns: local.$cmd query: { saslStart: 1, mechanism: "SCRAM-SHA-1", payload: BinData(0, 6E2C2C6E3D5F5F73797374656D2C723D534D4E376A716762326245447A315847506C4D2B50695A6E2B6E627338315834) } 2015-04-01T16:22:16.740+0000 D REPL [ReplExecNetThread-0] Network status of sending replSetHeartbeat to localhost:27017 was Location10276 DBClientBase::findN: transport error: localhost:27017 ns: local.$cmd query: { saslStart: 1, mechanism: "SCRAM-SHA-1", payload: BinData(0, 6E2C2C6E3D5F5F73797374656D2C723D534D4E376A716762326245447A315847506C4D2B50695A6E2B6E627338315834) } 2015-04-01T16:22:16.740+0000 I REPL [ReplicationExecutor] Error in heartbeat request to localhost:27017; Location10276 DBClientBase::findN: transport error: localhost:27017 ns: local.$cmd query: { saslStart: 1, mechanism: "SCRAM-SHA-1", payload: BinData(0, 6E2C2C6E3D5F5F73797374656D2C723D534D4E376A716762326245447A315847506C4D2B50695A6E2B6E627338315834) } 2015-04-01T16:22:16.740+0000 D REPL [ReplicationExecutor] Scheduling heartbeat to localhost:27017 at 2015-04-01T16:22:18.740Z 2015-04-01T16:22:16.740+0000 I REPL [ReplicationExecutor] Standing for election ... 2015-04-01T16:22:17.209+0000 I REPL [ReplicationExecutor] replSet election succeeded, assuming primary role 2015-04-01T16:22:17.209+0000 I REPL [ReplicationExecutor] transition to PRIMARY
So 27018 has taken the role of primary away from 27017.
Here's what seems to be bogging member 27017 down so much that 27018 concluded it was down:
2015-04-01T16:21:59.390+0000 D COMMAND [conn27] run command Tests04011621.$cmd { dropDatabase: 1 }
2015-04-01T16:21:59.390+0000 I COMMAND [conn27] dropDatabase Tests04011621 starting
2015-04-01T16:21:59.390+0000 D STORAGE [conn27] dropDatabase Tests04011621
...
2015-04-01T16:22:18.777+0000 I JOURNAL [conn27] journalCleanup...
2015-04-01T16:22:18.778+0000 I JOURNAL [conn27] removeJournalFiles
2015-04-01T16:22:18.778+0000 I STORAGE [DataFileSync] flushing mmaps took 17177ms for 10 files
2015-04-01T16:22:18.814+0000 D JOURNAL [conn27] removeJournalFiles end
2015-04-01T16:22:19.019+0000 I JOURNAL [conn27] journalCleanup...
2015-04-01T16:22:19.019+0000 I JOURNAL [conn27] removeJournalFiles
2015-04-01T16:22:19.019+0000 D JOURNAL [conn27] removeJournalFiles end
2015-04-01T16:22:19.047+0000 D STORAGE [conn27] remove file D:\jenkins\workspace\mongo-csharp-driver-test-windows\label\windows64\mc\replica_set\mo\auth\ms\30-release\mssl\ssl\artifacts\data\db27017\Tests04011621.ns
2015-04-01T16:22:19.156+0000 I COMMAND [conn27] dropDatabase Tests04011621 finished
2015-04-01T16:22:19.174+0000 I COMMAND [conn27] command Tests04011621.$cmd command: dropDatabase { dropDatabase: 1 } keyUpdates:0 writeConflicts:0 numYields:0 reslen:64 locks:{} 19784ms
- duplicates
-
SERVER-12235 Don't require a database read on every new localhost connection when auth is on
- Closed