Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-78734

shard mongod process crashes with "Invalid access at address: 0x55d2c1394000" within __wt_evict call

    • Type: Icon: Bug Bug
    • Resolution: Incomplete
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: 5.0.15
    • Component/s: None
    • None
    • ALL

      We are running a v5.0.15 sharded cluster with 20 shards and 5 replicas per shard. This morning a shard mongod process crashed for a secondary. Here's what was logged just prior to the process dying:

      {"t":{"$date":"2023-07-06T11:45:23.795+00:00"},"s":"F",  "c":"CONTROL",  "id":6384300, "ctx":"thread574967","msg":"Writing fatal message","attr":{"message":"Invalid access at address: 0x55d2c1394000\n"}}
      {"t":{"$date":"2023-07-06T11:45:23.795+00:00"},"s":"F",  "c":"CONTROL",  "id":6384300, "ctx":"thread574967","msg":"Writing fatal message","attr":{"message":"Got signal: 7 (Bus error).\n"}}
      {"t":{"$date":"2023-07-06T11:45:23.960+00:00"},"s":"I",  "c":"CONTROL",  "id":31380,   "ctx":"thread574967","msg":"BACKTRACE","attr":{"bt":{"backtrace":[{"a":"55CF4D853455","b":"55CF49927000","o":"3F2C455","s":"_ZN5mongo18stack_trace_detail12_GLOBAL__N_119printStackTraceImplERKNS1_7OptionsEPNS_14StackTraceSinkE.constprop.361","s+":"215"},{"a":"55CF4D855ED9","b":"55CF49927000","o":"3F2EED9","s":"_ZN5mongo15printStackTraceEv","s+":"29"},{"a":"55CF4D84E44C","b":"55CF49927000","o":"3F2744C","s":"abruptQuitWithAddrSignal","s+":"EC"},{"a":"7F0F131C88E0","b":"7F0F131B7000","o":"118E0","s":"funlockfile","s+":"50"},{"a":"55CF4B08B11C","b":"55CF49927000","o":"176411C","s":"__wt_cell_unpack_safe.constprop.11","s+":"9C"},{"a":"55CF4B0935A2","b":"55CF49927000","o":"176C5A2","s":"__wt_page_inmem","s+":"3532"},{"a":"55CF4B0AC96D","b":"55CF49927000","o":"178596D","s":"__split_multi_inmem","s+":"5D"},{"a":"55CF4B0B728B","b":"55CF49927000","o":"179028B","s":"__wt_split_rewrite","s+":"AB"},{"a":"55CF4AFDAE66","b":"55CF49927000","o":"16B3E66","s":"__wt_evict","s+":"10D6"},{"a":"55CF4AFD1F42","b":"55CF49927000","o":"16AAF42","s":"__evict_page","s+":"6A2"},{"a":"55CF4AFD2808","b":"55CF49927000","o":"16AB808","s":"__evict_lru_pages","s+":"78"},{"a":"55CF4AFD7524","b":"55CF49927000","o":"16B0524","s":"__wt_evict_thread_run","s+":"74"},{"a":"55CF4B03D8D9","b":"55CF49927000","o":"17168D9","s":"__thread_run","s+":"39"},{"a":"7F0F131BE44B","b":"7F0F131B7000","o":"744B","s":"start_thread","s+":"DB"},{"a":"7F0F12EF952F","b":"7F0F12E0A000","o":"EF52F","s":"clone","s+":"3F"}],"processInfo":{"mongodbVersion":"5.0.15","gitVersion":"935639beed3d0c19c2551c93854b831107c0b118","compiledModules":[],"uname":{"sysname":"Linux","release":"4.14.314-238.539.amzn2.x86_64","version":"#1 SMP Tue May 23 16:44:05 UTC 2023","machine":"x86_64"},"somap":[{"b":"55CF49927000","elfType":3,"buildId":"11A652B403DB0E37E9EAC8044BD6400062B20A1E"},{"b":"7F0F131B7000","path":"/lib64/libpthread.so.0","elfType":3,"buildId":"BC2E8D5CDFB0A3CC6DB42A136DD1BB61AF8EED99"},{"b":"7F0F12E0A000","path":"/lib64/libc.so.6","elfType":3,"buildId":"140E425DB38E5E4C2BFA7E56F3609E707B850AC5"}]}}}}
      {"t":{"$date":"2023-07-06T11:45:23.960+00:00"},"s":"I",  "c":"CONTROL",  "id":31445,   "ctx":"thread574967","msg":"Frame","attr":{"frame":{"a":"55CF4D853455","b":"55CF49927000","o":"3F2C455","s":"_ZN5mongo18stack_trace_detail12_GLOBAL__N_119printStackTraceImplERKNS1_7OptionsEPNS_14StackTraceSinkE.constprop.361","s+":"215"}}}
      {"t":{"$date":"2023-07-06T11:45:23.961+00:00"},"s":"I",  "c":"CONTROL",  "id":31445,   "ctx":"thread574967","msg":"Frame","attr":{"frame":{"a":"55CF4D855ED9","b":"55CF49927000","o":"3F2EED9","s":"_ZN5mongo15printStackTraceEv","s+":"29"}}}
      {"t":{"$date":"2023-07-06T11:45:23.961+00:00"},"s":"I",  "c":"CONTROL",  "id":31445,   "ctx":"thread574967","msg":"Frame","attr":{"frame":{"a":"55CF4D84E44C","b":"55CF49927000","o":"3F2744C","s":"abruptQuitWithAddrSignal","s+":"EC"}}}
      {"t":{"$date":"2023-07-06T11:45:23.961+00:00"},"s":"I",  "c":"CONTROL",  "id":31445,   "ctx":"thread574967","msg":"Frame","attr":{"frame":{"a":"7F0F131C88E0","b":"7F0F131B7000","o":"118E0","s":"funlockfile","s+":"50"}}}
      {"t":{"$date":"2023-07-06T11:45:23.961+00:00"},"s":"I",  "c":"CONTROL",  "id":31445,   "ctx":"thread574967","msg":"Frame","attr":{"frame":{"a":"55CF4B08B11C","b":"55CF49927000","o":"176411C","s":"__wt_cell_unpack_safe.constprop.11","s+":"9C"}}}
      {"t":{"$date":"2023-07-06T11:45:23.961+00:00"},"s":"I",  "c":"CONTROL",  "id":31445,   "ctx":"thread574967","msg":"Frame","attr":{"frame":{"a":"55CF4B0935A2","b":"55CF49927000","o":"176C5A2","s":"__wt_page_inmem","s+":"3532"}}}
      {"t":{"$date":"2023-07-06T11:45:23.961+00:00"},"s":"I",  "c":"CONTROL",  "id":31445,   "ctx":"thread574967","msg":"Frame","attr":{"frame":{"a":"55CF4B0AC96D","b":"55CF49927000","o":"178596D","s":"__split_multi_inmem","s+":"5D"}}}{"t":{"$date":"2023-07-06T11:45:23.961+00:00"},"s":"I",  "c":"CONTROL",  "id":31445,   "ctx":"thread574967","msg":"Frame","attr":{"frame":{"a":"55CF4B0B728B","b":"55CF49927000","o":"179028B","s":"__wt_split_rewrite","s+":"AB"}}}
      {"t":{"$date":"2023-07-06T11:45:23.961+00:00"},"s":"I",  "c":"CONTROL",  "id":31445,   "ctx":"thread574967","msg":"Frame","attr":{"frame":{"a":"55CF4AFDAE66","b":"55CF49927000","o":"16B3E66","s":"__wt_evict","s+":"10D6"}}}
      {"t":{"$date":"2023-07-06T11:45:23.961+00:00"},"s":"I",  "c":"CONTROL",  "id":31445,   "ctx":"thread574967","msg":"Frame","attr":{"frame":{"a":"55CF4AFD1F42","b":"55CF49927000","o":"16AAF42","s":"__evict_page","s+":"6A2"}}}
      {"t":{"$date":"2023-07-06T11:45:23.962+00:00"},"s":"I",  "c":"CONTROL",  "id":31445,   "ctx":"thread574967","msg":"Frame","attr":{"frame":{"a":"55CF4AFD2808","b":"55CF49927000","o":"16AB808","s":"__evict_lru_pages","s+":"78"}}}
      {"t":{"$date":"2023-07-06T11:45:23.962+00:00"},"s":"I",  "c":"CONTROL",  "id":31445,   "ctx":"thread574967","msg":"Frame","attr":{"frame":{"a":"55CF4AFD7524","b":"55CF49927000","o":"16B0524","s":"__wt_evict_thread_run","s+":"74"}}}
      {"t":{"$date":"2023-07-06T11:45:23.962+00:00"},"s":"I",  "c":"CONTROL",  "id":31445,   "ctx":"thread574967","msg":"Frame","attr":{"frame":{"a":"55CF4B03D8D9","b":"55CF49927000","o":"17168D9","s":"__thread_run","s+":"39"}}}
      {"t":{"$date":"2023-07-06T11:45:23.962+00:00"},"s":"I",  "c":"CONTROL",  "id":31445,   "ctx":"thread574967","msg":"Frame","attr":{"frame":{"a":"7F0F131BE44B","b":"7F0F131B7000","o":"744B","s":"start_thread","s+":"DB"}}}
      {"t":{"$date":"2023-07-06T11:45:23.962+00:00"},"s":"I",  "c":"CONTROL",  "id":31445,   "ctx":"thread574967","msg":"Frame","attr":{"frame":{"a":"7F0F12EF952F","b":"7F0F12E0A000","o":"EF52F","s":"clone","s+":"3F"}}}
      

      It appears the crash is related to WiredTiger code:

      __wt_evict -> __wt_split_rewrite -> __wt_cell_unpack_safe.constprop
      

      I was able to restart the mongod process without issues.

      We have never encountered this before as far as I know, so it is likely something that only occurs occasionally (e.g. maybe a concurrency timing issue).

            Assignee:
            chris.kelly@mongodb.com Chris Kelly
            Reporter:
            ian.springer@salesforce.com Ian Springer
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              Resolved: