-
Type: Bug
-
Resolution: Duplicate
-
Priority: Major - P3
-
None
-
Affects Version/s: 3.2.13
-
Component/s: Networking
-
None
-
Fully Compatible
-
ALL
-
4 days ago we upgrade from 3.0.7 to 3.2.13. We didn't have any crashes in a year but had 2 crashes on 2 different clusters since the upgrade. The first crash did not produce a stack trace but the second one did (please see attached log file). Here's the log from right before the crash and the stacktrace. It seems that the segmentation fault happened during a chunk split attempt.
2017-05-21T09:13:28.466-0400 I SHARDING [conn3787] request split points lookup for chunk postingrecommendation.postingrecommendation { : "TN", : "5913264d50499b0bb4434b24" } -->> { : "TN", : "7f934f001357ce9c0eb72c05" } 2017-05-21T09:13:28.515-0400 I SHARDING [conn3787] received splitChunk request: { splitChunk: "postingrecommendation.postingrecommendation", keyPattern: { _skp: 1.0, _id: 1.0 }, min: { _skp: "TN", _id: "5913264d50499b0bb4434b24" }, max: { _skp: "TN", _id: "7f934f001357ce9c0eb72c05" }, from: "jsra", splitKeys: [ { _skp: "TN", _id: "59185007638e290ba8e933a5" }, { _skp: "TN", _id: "591cff1650499b0bb44d70df" } ], shardId: "postingrecommendation.postingrecommendation-_skp_"TN"_id_"5913264d50499b0bb4434b24"", configdb: "mgocnf-a.snagprod.corp:27340,mgocnf-b.snagprod.corp:27340,mgocnf-c.snagprod.corp:27340", epoch: ObjectId('527abca8d31d1633acdaa97e') } 2017-05-21T09:13:28.749-0400 I SHARDING [conn3787] distributed lock 'postingrecommendation.postingrecommendation/mgo-jsra-a.snagprod.corp:27017:1495120452:697408834' acquired for 'splitting chunk [{ _skp: "TN", _id: "5913264d50499b0bb4434b24" }, { _skp: "TN", _id: "7f934f001357ce9c0eb72c05" }) in postingrecommendation.postingrecommendation', ts : 592192782e655bbf91dde53f 2017-05-21T09:13:28.749-0400 I SHARDING [conn3787] remotely refreshing metadata for postingrecommendation.postingrecommendation based on current shard version 30|12824||527abca8d31d1633acdaa97e, current metadata version is 30|12824||527abca8d31d1633acdaa97e 2017-05-21T09:13:28.751-0400 I SHARDING [conn3787] metadata of collection postingrecommendation.postingrecommendation already up to date (shard version : 30|12824||527abca8d31d1633acdaa97e, took 2 ms) 2017-05-21T09:13:28.752-0400 W SHARDING [conn3787] splitChunk cannot find chunk [{ _skp: "TN", _id: "5913264d50499b0bb4434b24" },{ _skp: "TN", _id: "7f934f001357ce9c0eb72c05" }) to split, the chunk boundaries may be stale 2017-05-21T09:13:28.850-0400 I SHARDING [conn3787] distributed lock 'postingrecommendation.postingrecommendation/mgo-jsra-a.snagprod.corp:27017:1495120452:697408834' unlocked. 2017-05-21T09:13:28.850-0400 I COMMAND [conn3787] command admin.$cmd command: splitChunk { splitChunk: "postingrecommendation.postingrecommendation", keyPattern: { _skp: 1.0, _id: 1.0 }, min: { _skp: "TN", _id: "5913264d50499b0bb4434b24" }, max: { _skp: "TN", _id: "7f934f001357ce9c0eb72c05" }, from: "jsra", splitKeys: [ { _skp: "TN", _id: "59185007638e290ba8e933a5" }, { _skp: "TN", _id: "591cff1650499b0bb44d70df" } ], shardId: "postingrecommendation.postingrecommendation-_skp_"TN"_id_"5913264d50499b0bb4434b24"", configdb: "mgocnf-a.snagprod.corp:27340,mgocnf-b.snagprod.corp:27340,mgocnf-c.snagprod.corp:27340", epoch: ObjectId('527abca8d31d1633acdaa97e') } keyUpdates:0 writeConflicts:0 exception: splitChunk cannot find chunk [{ _skp: "TN", _id: "5913264d50499b0bb4434b24" },{ _skp: "TN", _id: "7f934f001357ce9c0eb72c05" }) to split, the chunk boundaries may be stale ( ns : postingrecommendation.postingrecommendation, received : 0|0||000000000000000000000000, wanted : 30|12824||527abca8d31d1633acdaa97e, send ) code:13388 numYields:0 reslen:496 locks:{ Global: { acquireCount: { r: 1, w: 1 } }, MMAPV1Journal: { acquireCount: { w: 1 } }, Database: { acquireCount: { w: 1 } }, Collection: { acquireCount: { W: 1 } } } protocol:op_query 335ms 2017-05-21T09:13:28.851-0400 I NETWORK [conn3787] end connection 10.70.18.214:49658 (222 connections now open) 2017-05-21T09:13:28.866-0400 F - [thread1] Invalid access at address: 0xffffffffffffffe8 2017-05-21T09:13:28.997-0400 F - [thread1] Got signal: 11 (Segmentation fault). 0x133f4f2 0x133e649 0x133e9c8 0x7f3fd19db330 0x1b514e9 0x1b51ba9 0xa111e9 0xa118b5 0x11f3c03 0x11f5e50 0x1418fc0 0x7f3fd19d2f82 0x7f3fd19d3197 0x7f3fd1700bed ----- BEGIN BACKTRACE ----- {"backtrace":[{"b":"400000","o":"F3F4F2","s":"_ZN5mongo15printStackTraceERSo"},{"b":"400000","o":"F3E649"},{"b":"400000","o":"F3E9C8"},{"b":"7F3FD19CB000","o":"10330"},{"b":"400000","o":"17514E9","s":"_ZNSo6sentryC2ERSo"},{"b":"400000","o":"1751BA9","s":"_ZSt16__ostream_insertIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_PKS3_l"},{"b":"400000","o":"6111E9","s":"_ZN5mongo11PoolForHost4doneEPNS_16DBConnectionPoolEPNS_12DBClientBaseE"},{"b":"400000","o":"6118B5","s":"_ZN5mongo16DBConnectionPool7releaseERKSsPNS_12DBClientBaseE"},{"b":"400000","o":"DF3C03"},{"b":"400000","o":"DF5E50"},{"b":"400000","o":"1018FC0"},{"b":"7F3FD19CB000","o":"7F82"},{"b":"7F3FD19CB000","o":"8197"},{"b":"7F3FD1603000","o":"FDBED","s":"clone"}],"processInfo":{ "mongodbVersion" : "3.2.13", "gitVersion" : "23899209cad60aaafe114f6aea6cb83025ff51bc", "compiledModules" : [], "uname" : { "sysname" : "Linux", "release" : "3.13.0-112-generic", "version" : "#159-Ubuntu SMP Fri Mar 3 15:26:07 UTC 2017", "machine" : "x86_64" }, "somap" : [ { "elfType" : 2, "b" : "400000", "buildId" : "B559BDA626A4B7F4A29153D8DA0DAA0B3B48A82B" }, { "b" : "7FFCB9DAB000", "elfType" : 3, "buildId" : "012E1338BA43AF7C0DC7D069F64F0A6490CC6D9C" }, { "b" : "7F3FD28ED000", "path" : "/lib/x86_64-linux-gnu/libssl.so.1.0.0", "elfType" : 3, "buildId" : "48A664AE6B0B4918A3EB0156C6364C4F084232FD" }, { "b" : "7F3FD2511000", "path" : "/lib/x86_64-linux-gnu/libcrypto.so.1.0.0", "elfType" : 3, "buildId" : "6B8997EA892A7FF37AC8CAA8F239D595251889BB" }, { "b" : "7F3FD2309000", "path" : "/lib/x86_64-linux-gnu/librt.so.1", "elfType" : 3, "buildId" : "1EEBA762A6A2C8884D56033EE8CCE79B95CD974D" }, { "b" : "7F3FD2105000", "path" : "/lib/x86_64-linux-gnu/libdl.so.2", "elfType" : 3, "buildId" : "D0F881E59FF88BE4F29A228C8657376B3C325C2C" }, { "b" : "7F3FD1DFF000", "path" : "/lib/x86_64-linux-gnu/libm.so.6", "elfType" : 3, "buildId" : "1654CB13B1D24ED03F4BDCB51FC7524B9181A771" }, { "b" : "7F3FD1BE9000", "path" : "/lib/x86_64-linux-gnu/libgcc_s.so.1", "elfType" : 3, "buildId" : "36311B4457710AE5578C4BF00791DED7359DBB92" }, { "b" : "7F3FD19CB000", "path" : "/lib/x86_64-linux-gnu/libpthread.so.0", "elfType" : 3, "buildId" : "22F9078CFA529CCE1A814A4A1A1C018F169D5652" }, { "b" : "7F3FD1603000", "path" : "/lib/x86_64-linux-gnu/libc.so.6", "elfType" : 3, "buildId" : "CA5C6CFE528AF541C3C2C15CEE4B3C74DA4E2FB4" }, { "b" : "7F3FD2B4C000", "path" : "/lib64/ld-linux-x86-64.so.2", "elfType" : 3, "buildId" : "237E22E5AAC2DDFCD06518F63FD720FE758E6E5B" } ] }} mongod(_ZN5mongo15printStackTraceERSo+0x32) [0x133f4f2] mongod(+0xF3E649) [0x133e649] mongod(+0xF3E9C8) [0x133e9c8] libpthread.so.0(+0x10330) [0x7f3fd19db330] mongod(_ZNSo6sentryC2ERSo+0x19) [0x1b514e9] mongod(_ZSt16__ostream_insertIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_PKS3_l+0x29) [0x1b51ba9] mongod(_ZN5mongo11PoolForHost4doneEPNS_16DBConnectionPoolEPNS_12DBClientBaseE+0x109) [0xa111e9] mongod(_ZN5mongo16DBConnectionPool7releaseERKSsPNS_12DBClientBaseE+0xE5) [0xa118b5] mongod(+0xDF3C03) [0x11f3c03] mongod(+0xDF5E50) [0x11f5e50] mongod(+0x1018FC0) [0x1418fc0] libpthread.so.0(+0x7F82) [0x7f3fd19d2f82] libpthread.so.0(+0x8197) [0x7f3fd19d3197] libc.so.6(clone+0x6D) [0x7f3fd1700bed] ----- END BACKTRACE -----
- duplicates
-
SERVER-29152 Segfault in multiple shard primaries under regular load
- Closed