-
Type: Bug
-
Resolution: Done
-
Priority: Critical - P2
-
None
-
Affects Version/s: 2.4.5
-
Component/s: Replication
-
Environment:RHEL 5.9
-
Linux
I have a 3 node replica-set configured using MongoDB 2.4.5 with SSL. In the past two months of having this configuration, I've observed two of the three mongod instances seg faulting on occassion. It's not always the same two servers and it sometimes happens every few days but other times we've gone 2 weeks without an issue.
In the latest crash, db3 was primary and db2 and db1 were secondary. In this event, db3 and db2 crashed but db1 continued to run (although it's replication status was "secondary").
In the db3 log (I replaced host:port with db2) when it crashed I see:
Wed Sep 25 05:31:59.983 [rsHealthPoll] couldn't connect to db2
Wed Sep 25 05:31:59.983 [rsHealthPoll] replset info db2 heartbeat failed, retrying
Wed Sep 25 05:31:59.987 Invalid access at address: 0x3debd from thread: rsHealthPoll
Wed Sep 25 05:31:59.990 [rsHealthPoll] couldn't connect to db2
Wed Sep 25 05:32:00.060 Got signal: 11 (Segmentation fault).
Wed Sep 25 05:32:00.062 [rsHealthPoll] couldn't connect to db2
Wed Sep 25 05:32:00.136 [rsHealthPoll] couldn't connect to db2
Wed Sep 25 05:32:00.895 Backtrace:
0xde1c11 0x6d2c29 0x6d31b2 <more memory addresses>
/path/to/mongodb/bin/mongod(_ZN5mongo15printStackTraceERSo+0x21)
/path/to/mongodb/bin/mongod(_ZN5mongo10abruptQuitEi+0x399)
/path/to/mongodb/bin/mongod(_ZN5mongo24abruptQuitWithAddrSignalEiP7siginfoPv+0x262)
/lib64/libpthread.so.0
/path/to/mongodb/bin/mongod(_ZN8tcmalloc15CentralFreeList14FetchFromSpansEv+0x39)
/path/to/mongodb/bin/mongod(_ZN8tcmalloc15CentralFreeList11RemoveRangeEPPvS2_i+0xbc)
/path/to/mongodb/bin/mongod(_ZN8tcmalloc11ThreadCache21FetchFromCentralCacheEmm+0x9d)
/path/to/mongodb/bin/mongod [0xe3e0df]
/path/to/mongodb/bin/mongod(malloc+0xe2)
/lib64/libcrypto.so.6(CRYPTO_malloc+0x62)
....