Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Duplicate
Priority: Major - P3
Fix Version/s: None
Affects Version/s: 3.0.0
Component/s: Replication
Labels:
None

Backwards Compatibility:
Fully Compatible
Operating System:
Windows
Sprint:
Security 1 04/03/15
Confidence Status:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name:
None
Goal Link:
None

We are encountering unexpected elections when running the .NET driver unit tests against a replica set (with auth and ssl if that makes a difference).

Apparently running the dropDatabase command can cause an unexpected election to occur.

From the attached logs:

On member 27017:

2015-04-01T16:22:18.013+0000 D REPL     [ReplicationExecutor] localhost:27018 thinks that we are down because they cannot send us heartbeats.
2015-04-01T16:22:18.013+0000 I REPL     [ReplicationExecutor] Member localhost:27018 is now in state PRIMARY
2015-04-01T16:22:18.013+0000 I REPL     [ReplicationExecutor] replset: another primary seen with election time 551c1b39:1 my election time is 551c1ab4:1
2015-04-01T16:22:18.013+0000 I REPL     [ReplicationExecutor] stepping down; another primary was elected more recently
2015-04-01T16:22:18.013+0000 D REPL     [ReplicationExecutor] Scheduling heartbeat to localhost:27018 at 2015-04-01T16:22:20.013Z
2015-04-01T16:22:18.013+0000 I REPL     [ReplicationExecutor] Stepping down from primary in response to heartbeat

And on member 27018:

2015-04-01T16:22:16.553+0000 I NETWORK  [ReplExecNetThread-0] Socket recv() timeout  127.0.0.1:27017
2015-04-01T16:22:16.553+0000 I NETWORK  [ReplExecNetThread-0] SocketException: remote: 127.0.0.1:27017 error: 9001 socket exception [RECV_TIMEOUT] server [127.0.0.1:27017] 
2015-04-01T16:22:16.553+0000 I NETWORK  [ReplExecNetThread-0] DBClientCursor::init call() failed
2015-04-01T16:22:16.553+0000 D -        [ReplExecNetThread-0] User Assertion: 10276:DBClientBase::findN: transport error: localhost:27017 ns: local.$cmd query: { saslStart: 1, mechanism: "SCRAM-SHA-1", payload: BinData(0, 6E2C2C6E3D5F5F73797374656D2C723D534D4E376A716762326245447A315847506C4D2B50695A6E2B6E627338315834) }
2015-04-01T16:22:16.740+0000 D REPL     [ReplExecNetThread-0] Network status of sending replSetHeartbeat to localhost:27017 was Location10276 DBClientBase::findN: transport error: localhost:27017 ns: local.$cmd query: { saslStart: 1, mechanism: "SCRAM-SHA-1", payload: BinData(0, 6E2C2C6E3D5F5F73797374656D2C723D534D4E376A716762326245447A315847506C4D2B50695A6E2B6E627338315834) }
2015-04-01T16:22:16.740+0000 I REPL     [ReplicationExecutor] Error in heartbeat request to localhost:27017; Location10276 DBClientBase::findN: transport error: localhost:27017 ns: local.$cmd query: { saslStart: 1, mechanism: "SCRAM-SHA-1", payload: BinData(0, 6E2C2C6E3D5F5F73797374656D2C723D534D4E376A716762326245447A315847506C4D2B50695A6E2B6E627338315834) }
2015-04-01T16:22:16.740+0000 D REPL     [ReplicationExecutor] Scheduling heartbeat to localhost:27017 at 2015-04-01T16:22:18.740Z
2015-04-01T16:22:16.740+0000 I REPL     [ReplicationExecutor] Standing for election
...
2015-04-01T16:22:17.209+0000 I REPL     [ReplicationExecutor] replSet election succeeded, assuming primary role
2015-04-01T16:22:17.209+0000 I REPL     [ReplicationExecutor] transition to PRIMARY

So 27018 has taken the role of primary away from 27017.

Here's what seems to be bogging member 27017 down so much that 27018 concluded it was down:

2015-04-01T16:21:59.390+0000 D COMMAND  [conn27] run command Tests04011621.$cmd { dropDatabase: 1 }
2015-04-01T16:21:59.390+0000 I COMMAND  [conn27] dropDatabase Tests04011621 starting
2015-04-01T16:21:59.390+0000 D STORAGE  [conn27] dropDatabase Tests04011621
...
2015-04-01T16:22:18.777+0000 I JOURNAL  [conn27] journalCleanup...
2015-04-01T16:22:18.778+0000 I JOURNAL  [conn27] removeJournalFiles
2015-04-01T16:22:18.778+0000 I STORAGE  [DataFileSync] flushing mmaps took 17177ms  for 10 files
2015-04-01T16:22:18.814+0000 D JOURNAL  [conn27] removeJournalFiles end
2015-04-01T16:22:19.019+0000 I JOURNAL  [conn27] journalCleanup...
2015-04-01T16:22:19.019+0000 I JOURNAL  [conn27] removeJournalFiles
2015-04-01T16:22:19.019+0000 D JOURNAL  [conn27] removeJournalFiles end
2015-04-01T16:22:19.047+0000 D STORAGE  [conn27] remove file D:\jenkins\workspace\mongo-csharp-driver-test-windows\label\windows64\mc\replica_set\mo\auth\ms\30-release\mssl\ssl\artifacts\data\db27017\Tests04011621.ns
2015-04-01T16:22:19.156+0000 I COMMAND  [conn27] dropDatabase Tests04011621 finished
2015-04-01T16:22:19.174+0000 I COMMAND  [conn27] command Tests04011621.$cmd command: dropDatabase { dropDatabase: 1 } keyUpdates:0 writeConflicts:0 numYields:0 reslen:64 locks:{} 19784ms

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

27019.log
Apr 01 2015 06:15:23 PM UTC
145 kB
Robert Stam
db27017.log
Apr 01 2015 06:15:23 PM UTC
23.76 MB
Robert Stam
db27018.log
Apr 01 2015 06:15:23 PM UTC
3.49 MB
Robert Stam

duplicates

SERVER-12235 Don't require a database read on every new localhost connection when auth is on

Closed

Assignee:: Amalia Hawkins (Inactive)

Reporter:: Robert Stam

Participants:: Amalia Hawkins, Bernie Hackett, Eric Milkie, Jeffrey Yemin, Matt Dannenberg, Robert Stam

Votes:: 0 Vote for this issue

Watchers:: 7 Start watching this issue

Created:: Apr 01 2015 06:15:23 PM UTC

Updated:: Apr 03 2015 03:24:01 PM UTC

Resolved:: Apr 03 2015 02:45:02 PM UTC

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates