-
Type: Bug
-
Resolution: Incomplete
-
Priority: Major - P3
-
None
-
Affects Version/s: 5.0.15
-
Component/s: None
-
None
-
ALL
We are running a v5.0.15 sharded cluster with 20 shards and 5 replicas per shard. This morning a shard mongod process crashed for a secondary. Here's what was logged just prior to the process dying:
{"t":{"$date":"2023-07-06T11:45:23.795+00:00"},"s":"F", "c":"CONTROL", "id":6384300, "ctx":"thread574967","msg":"Writing fatal message","attr":{"message":"Invalid access at address: 0x55d2c1394000\n"}} {"t":{"$date":"2023-07-06T11:45:23.795+00:00"},"s":"F", "c":"CONTROL", "id":6384300, "ctx":"thread574967","msg":"Writing fatal message","attr":{"message":"Got signal: 7 (Bus error).\n"}} {"t":{"$date":"2023-07-06T11:45:23.960+00:00"},"s":"I", "c":"CONTROL", "id":31380, "ctx":"thread574967","msg":"BACKTRACE","attr":{"bt":{"backtrace":[{"a":"55CF4D853455","b":"55CF49927000","o":"3F2C455","s":"_ZN5mongo18stack_trace_detail12_GLOBAL__N_119printStackTraceImplERKNS1_7OptionsEPNS_14StackTraceSinkE.constprop.361","s+":"215"},{"a":"55CF4D855ED9","b":"55CF49927000","o":"3F2EED9","s":"_ZN5mongo15printStackTraceEv","s+":"29"},{"a":"55CF4D84E44C","b":"55CF49927000","o":"3F2744C","s":"abruptQuitWithAddrSignal","s+":"EC"},{"a":"7F0F131C88E0","b":"7F0F131B7000","o":"118E0","s":"funlockfile","s+":"50"},{"a":"55CF4B08B11C","b":"55CF49927000","o":"176411C","s":"__wt_cell_unpack_safe.constprop.11","s+":"9C"},{"a":"55CF4B0935A2","b":"55CF49927000","o":"176C5A2","s":"__wt_page_inmem","s+":"3532"},{"a":"55CF4B0AC96D","b":"55CF49927000","o":"178596D","s":"__split_multi_inmem","s+":"5D"},{"a":"55CF4B0B728B","b":"55CF49927000","o":"179028B","s":"__wt_split_rewrite","s+":"AB"},{"a":"55CF4AFDAE66","b":"55CF49927000","o":"16B3E66","s":"__wt_evict","s+":"10D6"},{"a":"55CF4AFD1F42","b":"55CF49927000","o":"16AAF42","s":"__evict_page","s+":"6A2"},{"a":"55CF4AFD2808","b":"55CF49927000","o":"16AB808","s":"__evict_lru_pages","s+":"78"},{"a":"55CF4AFD7524","b":"55CF49927000","o":"16B0524","s":"__wt_evict_thread_run","s+":"74"},{"a":"55CF4B03D8D9","b":"55CF49927000","o":"17168D9","s":"__thread_run","s+":"39"},{"a":"7F0F131BE44B","b":"7F0F131B7000","o":"744B","s":"start_thread","s+":"DB"},{"a":"7F0F12EF952F","b":"7F0F12E0A000","o":"EF52F","s":"clone","s+":"3F"}],"processInfo":{"mongodbVersion":"5.0.15","gitVersion":"935639beed3d0c19c2551c93854b831107c0b118","compiledModules":[],"uname":{"sysname":"Linux","release":"4.14.314-238.539.amzn2.x86_64","version":"#1 SMP Tue May 23 16:44:05 UTC 2023","machine":"x86_64"},"somap":[{"b":"55CF49927000","elfType":3,"buildId":"11A652B403DB0E37E9EAC8044BD6400062B20A1E"},{"b":"7F0F131B7000","path":"/lib64/libpthread.so.0","elfType":3,"buildId":"BC2E8D5CDFB0A3CC6DB42A136DD1BB61AF8EED99"},{"b":"7F0F12E0A000","path":"/lib64/libc.so.6","elfType":3,"buildId":"140E425DB38E5E4C2BFA7E56F3609E707B850AC5"}]}}}} {"t":{"$date":"2023-07-06T11:45:23.960+00:00"},"s":"I", "c":"CONTROL", "id":31445, "ctx":"thread574967","msg":"Frame","attr":{"frame":{"a":"55CF4D853455","b":"55CF49927000","o":"3F2C455","s":"_ZN5mongo18stack_trace_detail12_GLOBAL__N_119printStackTraceImplERKNS1_7OptionsEPNS_14StackTraceSinkE.constprop.361","s+":"215"}}} {"t":{"$date":"2023-07-06T11:45:23.961+00:00"},"s":"I", "c":"CONTROL", "id":31445, "ctx":"thread574967","msg":"Frame","attr":{"frame":{"a":"55CF4D855ED9","b":"55CF49927000","o":"3F2EED9","s":"_ZN5mongo15printStackTraceEv","s+":"29"}}} {"t":{"$date":"2023-07-06T11:45:23.961+00:00"},"s":"I", "c":"CONTROL", "id":31445, "ctx":"thread574967","msg":"Frame","attr":{"frame":{"a":"55CF4D84E44C","b":"55CF49927000","o":"3F2744C","s":"abruptQuitWithAddrSignal","s+":"EC"}}} {"t":{"$date":"2023-07-06T11:45:23.961+00:00"},"s":"I", "c":"CONTROL", "id":31445, "ctx":"thread574967","msg":"Frame","attr":{"frame":{"a":"7F0F131C88E0","b":"7F0F131B7000","o":"118E0","s":"funlockfile","s+":"50"}}} {"t":{"$date":"2023-07-06T11:45:23.961+00:00"},"s":"I", "c":"CONTROL", "id":31445, "ctx":"thread574967","msg":"Frame","attr":{"frame":{"a":"55CF4B08B11C","b":"55CF49927000","o":"176411C","s":"__wt_cell_unpack_safe.constprop.11","s+":"9C"}}} {"t":{"$date":"2023-07-06T11:45:23.961+00:00"},"s":"I", "c":"CONTROL", "id":31445, "ctx":"thread574967","msg":"Frame","attr":{"frame":{"a":"55CF4B0935A2","b":"55CF49927000","o":"176C5A2","s":"__wt_page_inmem","s+":"3532"}}} {"t":{"$date":"2023-07-06T11:45:23.961+00:00"},"s":"I", "c":"CONTROL", "id":31445, "ctx":"thread574967","msg":"Frame","attr":{"frame":{"a":"55CF4B0AC96D","b":"55CF49927000","o":"178596D","s":"__split_multi_inmem","s+":"5D"}}}{"t":{"$date":"2023-07-06T11:45:23.961+00:00"},"s":"I", "c":"CONTROL", "id":31445, "ctx":"thread574967","msg":"Frame","attr":{"frame":{"a":"55CF4B0B728B","b":"55CF49927000","o":"179028B","s":"__wt_split_rewrite","s+":"AB"}}} {"t":{"$date":"2023-07-06T11:45:23.961+00:00"},"s":"I", "c":"CONTROL", "id":31445, "ctx":"thread574967","msg":"Frame","attr":{"frame":{"a":"55CF4AFDAE66","b":"55CF49927000","o":"16B3E66","s":"__wt_evict","s+":"10D6"}}} {"t":{"$date":"2023-07-06T11:45:23.961+00:00"},"s":"I", "c":"CONTROL", "id":31445, "ctx":"thread574967","msg":"Frame","attr":{"frame":{"a":"55CF4AFD1F42","b":"55CF49927000","o":"16AAF42","s":"__evict_page","s+":"6A2"}}} {"t":{"$date":"2023-07-06T11:45:23.962+00:00"},"s":"I", "c":"CONTROL", "id":31445, "ctx":"thread574967","msg":"Frame","attr":{"frame":{"a":"55CF4AFD2808","b":"55CF49927000","o":"16AB808","s":"__evict_lru_pages","s+":"78"}}} {"t":{"$date":"2023-07-06T11:45:23.962+00:00"},"s":"I", "c":"CONTROL", "id":31445, "ctx":"thread574967","msg":"Frame","attr":{"frame":{"a":"55CF4AFD7524","b":"55CF49927000","o":"16B0524","s":"__wt_evict_thread_run","s+":"74"}}} {"t":{"$date":"2023-07-06T11:45:23.962+00:00"},"s":"I", "c":"CONTROL", "id":31445, "ctx":"thread574967","msg":"Frame","attr":{"frame":{"a":"55CF4B03D8D9","b":"55CF49927000","o":"17168D9","s":"__thread_run","s+":"39"}}} {"t":{"$date":"2023-07-06T11:45:23.962+00:00"},"s":"I", "c":"CONTROL", "id":31445, "ctx":"thread574967","msg":"Frame","attr":{"frame":{"a":"7F0F131BE44B","b":"7F0F131B7000","o":"744B","s":"start_thread","s+":"DB"}}} {"t":{"$date":"2023-07-06T11:45:23.962+00:00"},"s":"I", "c":"CONTROL", "id":31445, "ctx":"thread574967","msg":"Frame","attr":{"frame":{"a":"7F0F12EF952F","b":"7F0F12E0A000","o":"EF52F","s":"clone","s+":"3F"}}}
It appears the crash is related to WiredTiger code:
__wt_evict -> __wt_split_rewrite -> __wt_cell_unpack_safe.constprop
I was able to restart the mongod process without issues.
We have never encountered this before as far as I know, so it is likely something that only occurs occasionally (e.g. maybe a concurrency timing issue).