We use a mongodb sharded setup with four shards (each a replicaset with one two mongodbs and one arbiter). Since we do batch inserts we use pre-splitting. All mongodbs are version 2.4.3.
One the slave of one of the shards we got the following segmentation fault:
Mon Jun 24 16:22:25.325 [conn912] command admin.$cmd command: { splitChunk: "ad_cache.cache-xxx", keyPattern: { _id: "hashed" }, min: { _id: 0 }, max: { _id: 4611686018427387900 }, from: "mongo_rs33", splitKeys: [ { _id: 2305843009213693950 } ], shardId: "ad_cache.cache-xxx-_id_0", configdb: "s33:27019,s34:27019,s35:27019" } ntoreturn:1 keyUpdates:0 locks(micros) r:61 reslen:37 5490ms Mon Jun 24 16:22:25.329 [conn912] received splitChunk request: { splitChunk: "ad_cache.cache-xxx", keyPattern: { _id: "hashed" }, min: { _id: 4611686018427387900 }, max: { _id: MaxKey }, from: "mongo_rs33", splitKeys: [ { _id: 6917529027641081850 } ], shardId: "ad_cache.cache-xxx-_id_4611686018427387900", configdb: "s33:27019,s34:27019,s35:27019" } Mon Jun 24 16:22:27.501 [conn912] distributed lock 'ad_cache.cache-xxx/h34:27020:1372068904:410557907' acquired, ts : 51c85621020fcf91b8e782c2 Mon Jun 24 16:22:27.533 [conn912] splitChunk accepted at version 2|5||51c855e903e25cae7275142f Mon Jun 24 16:22:28.813 [conn912] about to log metadata event: { _id: "h34-2013-06-24T14:22:28-51c85624020fcf91b8e782cd", server: "h34", clientAddr: "10.48.2.33:38163", time: new Date(1372083748813), what: "split", ns: "ad_cache.cache-xxx", details: { before: { min: { _id: 4611686018427387900 }, max: { _id: MaxKey }, lastmod: Timestamp 1000|3, lastmodEpoch: ObjectId('000000000000000000000000') }, left: { min: { _id: 4611686018427387900 }, max: { _id: 6917529027641081850 }, lastmod: Timestamp 2000|8, lastmodEpoch: ObjectId('51c855e903e25cae7275142f') }, right: { min: { _id: 6917529027641081850 }, max: { _id: MaxKey }, lastmod: Timestamp 2000|9, lastmodEpoch: ObjectId('51c855e903e25cae7275142f') } } } Mon Jun 24 16:22:30.358 Invalid access at address: 0xbc from thread: conn912 Mon Jun 24 16:22:30.358 Got signal: 11 (Segmentation fault). Mon Jun 24 16:22:30.451 Backtrace: 0xdcf361 0x6cf729 0x6cfcb2 0x7fb1c1e0bff0 0xccf5d4 0x8d236a 0x8d5065 0x8d6592 0xa7c97b 0xa80360 0x9f44d4 0x9f57e2 0x6e747a 0xdbbb7e 0x7fb1c1e038ca 0x7fb1c11b6b6d /usr/bin/mongod(_ZN5mongo15printStackTraceERSo+0x21) [0xdcf361] /usr/bin/mongod(_ZN5mongo10abruptQuitEi+0x399) [0x6cf729] /usr/bin/mongod(_ZN5mongo24abruptQuitWithAddrSignalEiP7siginfoPv+0x262) [0x6cfcb2] /lib/libpthread.so.0(+0xeff0) [0x7fb1c1e0bff0] /usr/bin/mongod(_ZN5mongo17SplitChunkCommand3runERKSsRNS_7BSONObjEiRSsRNS_14BSONObjBuilderEb+0x85c4) [0xccf5d4] /usr/bin/mongod(_ZN5mongo12_execCommandEPNS_7CommandERKSsRNS_7BSONObjEiRSsRNS_14BSONObjBuilderEb+0x3a) [0x8d236a] /usr/bin/mongod(_ZN5mongo7Command11execCommandEPS0_RNS_6ClientEiPKcRNS_7BSONObjERNS_14BSONObjBuilderEb+0x705) [0x8d5065] /usr/bin/mongod(_ZN5mongo12_runCommandsEPKcRNS_7BSONObjERNS_11_BufBuilderINS_16TrivialAllocatorEEERNS_14BSONObjBuilderEbi+0x5e2) [0x8d6592] /usr/bin/mongod(_ZN5mongo11runCommandsEPKcRNS_7BSONObjERNS_5CurOpERNS_11_BufBuilderINS_16TrivialAllocatorEEERNS_14BSONObjBuilderEbi+0x3b) [0xa7c97b] /usr/bin/mongod(_ZN5mongo8runQueryERNS_7MessageERNS_12QueryMessageERNS_5CurOpES1_+0xd50) [0xa80360] /usr/bin/mongod() [0x9f44d4] /usr/bin/mongod(_ZN5mongo16assembleResponseERNS_7MessageERNS_10DbResponseERKNS_11HostAndPortE+0x392) [0x9f57e2] /usr/bin/mongod(_ZN5mongo16MyMessageHandler7processERNS_7MessageEPNS_21AbstractMessagingPortEPNS_9LastErrorE+0x9a) [0x6e747a] /usr/bin/mongod(_ZN5mongo17PortMessageServer17handleIncomingMsgEPv+0x42e) [0xdbbb7e] /lib/libpthread.so.0(+0x68ca) [0x7fb1c1e038ca] /lib/libc.so.6(clone+0x6d) [0x7fb1c11b6b6d]
The database is running fine for some time after we restart it. After a few hours the segmentation fault occurs again. We already tried removing the dbpath files and letting the mongodb do an initial sync (this occasionally succeeds but the segmentation fault happens again).
- is related to
-
SERVER-5160 Handle all failed shardCollection commands well
- Closed
- related to
-
SERVER-7790 Segfault in splitchunk following dropDatabase
- Closed