-
Type: Bug
-
Resolution: Done
-
Priority: Major - P3
-
None
-
Affects Version/s: 2.4.8
-
Component/s: Replication
-
ALL
Remove a member and re-add it promptly. The first attempt to re-add fails, the second succeeds:
rs:PRIMARY> rs.remove('localhost:27018') 2013-12-09T17:30:05.241-0500 DBClientCursor::init call() failed 2013-12-09T17:30:05.241-0500 Error: error doing query: failed at src/mongo/shell/query.js:81 2013-12-09T17:30:05.243-0500 trying reconnect to 127.0.0.1:27017 2013-12-09T17:30:05.243-0500 reconnect 127.0.0.1:27017 ok rs:PRIMARY> var config = rs.conf() rs:PRIMARY> config.members.push({_id: 1, host: 'localhost:27018'}) 2 rs:PRIMARY> rs.reconfig(config) { "errmsg" : "exception: need most members up to reconfigure, not ok : localhost:27018", "code" : 13144, "ok" : 0 } rs:PRIMARY> rs.reconfig(config) { "ok" : 1 }
The primary logs:
replSet cmufcc requestHeartbeat localhost:27018 : 9001 socket exception [SEND_ERROR] server [127.0.0.1:27018] replSet replSetReconfig exception: need most members up to reconfigure, not ok : localhost:27018
I think the offending code is in rs_initiate.cpp:98; it seems the primary thinks it still has a cached connection to the removed member, but the member closed its side of that connection when it was removed. The first attempt to use the old connection fails, and clears the cache. The second attempt creates a new connection and succeeds.