-
Type: Bug
-
Resolution: Cannot Reproduce
-
Priority: Major - P3
-
None
-
Affects Version/s: 2.4.0
-
Component/s: Replication
-
Linux
I tried to spinning up a new replica set from a previously unreplicated database. I upgraded to Mongo 2.4.0 on both machines. I killed the local.* files on both machines, restarted them with the replSet option, and then ran rs.initiate() and rs.add() on the second machine.
Sometime during the synching phase, both servers crash. The secondary one reported:
Wed Mar 20 22:07:27.102 [rsSync] clone sensordb.gas_readings_by_hour 121215 Wed Mar 20 22:07:42.825 [conn440] end connection 10.10.0.2:36829 (0 connections now open) Wed Mar 20 22:07:42.826 [initandlisten] connection accepted from 10.10.0.2:37534 #441 (2 connections now open) Wed Mar 20 22:07:53.884 [rsHealthPoll] replset info owl:27017 thinks that we are down Wed Mar 20 22:08:12.296 [initandlisten] connection accepted from 10.10.0.2:40000 #442 (2 connections now open) Wed Mar 20 22:08:12.296 [initandlisten] connection accepted from 10.10.0.2:54924 #443 (3 connections now open) Wed Mar 20 22:07:54.209 [rsSync] Socket flush send() errno:9 Bad file descriptor 10.10.0.2:27017 Wed Mar 20 22:08:12.296 [rsHealthPoll] replSet member owl:27017 is now in state SECONDARY Wed Mar 20 22:08:12.296 [rsSync] caught exception (socket exception [SEND_ERROR] for 10.10.0.2:27017) in destructor (~PiggyBackData) Wed Mar 20 22:08:12.296 [conn441] end connection 10.10.0.2:37534 (2 connections now open) Wed Mar 20 22:08:12.296 [rsSync] replSet initial sync exception: 16465 recv failed while exhausting cursor 0 attempts remaining Wed Mar 20 22:08:12.296 [conn442] end connection 10.10.0.2:40000 (1 connection now open) Wed Mar 20 22:08:15.201 [DataFileSync] flushing mmaps took 36971ms for 67 files Wed Mar 20 22:08:18.305 [conn443] replSet info voting yea for owl:27017 (0) Wed Mar 20 22:08:20.305 [rsHealthPoll] replSet member owl:27017 is now in state PRIMARY Wed Mar 20 22:08:38.313 [conn443] end connection 10.10.0.2:54924 (0 connections now open) Wed Mar 20 22:08:38.313 [initandlisten] connection accepted from 10.10.0.2:53202 #444 (1 connection now open) Wed Mar 20 22:08:42.296 [rsSync] Fatal Assertion 16233 0xdcae01 0xd8ab83 0xc0230f 0xc1df91 0xc1edad 0xc1f07c 0xe13709 0x7f7f5caa8e9a 0x7f7f5bdbbcbd /usr/bin/mongod(_ZN5mongo15printStackTraceERSo+0x21) [0xdcae01] /usr/bin/mongod(_ZN5mongo13fassertFailedEi+0xa3) [0xd8ab83] /usr/bin/mongod(_ZN5mongo11ReplSetImpl17syncDoInitialSyncEv+0x6f) [0xc0230f] /usr/bin/mongod(_ZN5mongo11ReplSetImpl11_syncThreadEv+0x71) [0xc1df91] /usr/bin/mongod(_ZN5mongo11ReplSetImpl10syncThreadEv+0x2d) [0xc1edad] /usr/bin/mongod(_ZN5mongo15startSyncThreadEv+0x6c) [0xc1f07c] /usr/bin/mongod() [0xe13709] /lib/x86_64-linux-gnu/libpthread.so.0(+0x7e9a) [0x7f7f5caa8e9a] /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7f7f5bdbbcbd] Wed Mar 20 22:08:42.300 [rsSync] ***aborting after fassert() failure Wed Mar 20 22:08:42.300 Got signal: 6 (Aborted). Wed Mar 20 22:08:42.304 Backtrace: 0xdcae01 0x6ce879 0x7f7f5bcfe4a0 0x7f7f5bcfe425 0x7f7f5bd01b8b 0xd8abbe 0xc0230f 0xc1df91 0xc1edad 0xc1f07c 0xe13709 0x7f7f5caa8e9a 0x7f7f5bdbbcbd /usr/bin/mongod(_ZN5mongo15printStackTraceERSo+0x21) [0xdcae01] /usr/bin/mongod(_ZN5mongo10abruptQuitEi+0x399) [0x6ce879] /lib/x86_64-linux-gnu/libc.so.6(+0x364a0) [0x7f7f5bcfe4a0] /lib/x86_64-linux-gnu/libc.so.6(gsignal+0x35) [0x7f7f5bcfe425] /lib/x86_64-linux-gnu/libc.so.6(abort+0x17b) [0x7f7f5bd01b8b] /usr/bin/mongod(_ZN5mongo13fassertFailedEi+0xde) [0xd8abbe] /usr/bin/mongod(_ZN5mongo11ReplSetImpl17syncDoInitialSyncEv+0x6f) [0xc0230f] /usr/bin/mongod(_ZN5mongo11ReplSetImpl11_syncThreadEv+0x71) [0xc1df91] /usr/bin/mongod(_ZN5mongo11ReplSetImpl10syncThreadEv+0x2d) [0xc1edad] /usr/bin/mongod(_ZN5mongo15startSyncThreadEv+0x6c) [0xc1f07c] /usr/bin/mongod() [0xe13709] /lib/x86_64-linux-gnu/libpthread.so.0(+0x7e9a) [0x7f7f5caa8e9a] /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7f7f5bdbbcbd]