-
Type: Question
-
Resolution: Duplicate
-
Priority: Major - P3
-
None
-
Affects Version/s: 4.4.2
-
Component/s: None
-
None
-
Service Arch
-
(copied to CRM)
Today we have experienced a major issue in our mongodb production database.
3 weeks ago we upgraded binaries from 4.2 to 4.4 in all nodes in the replica set.
Today, in order to complete the upgrade, we ran the following command from the primary (as mentioned in the official doc https://docs.mongodb.com/manual/release-notes/4.4-upgrade-replica-set/#feature-compatibility-version) {{db.adminCommand( { setFeatureCompatibilityVersion: "4.4" } )}}
Right after execute the command all data-bearing nodes of our replica set got unhealthy. CPU spike to 100% in all data-bearing nodes.
Any attempt to establish a connection to the replica set end with a timeout or a rejection.
The mongod log shows this fatal error:
{"t":{"$date":"2021-01-13T11:50:17.983+00:00"},"s":"F", "c":"-", "id":23079, "ctx":"TopologyVersionObserver","msg":"Invariant failure","attr":{"expr":"opCtx != nullptr && _opCtx == nullptr","file":"src/mongo/db/client.cpp","line":126}} {"t":{"$date":"2021-01-13T11:50:17.985+00:00"},"s":"F", "c":"-", "id":23080, "ctx":"TopologyVersionObserver","msg":"\n\n***aborting after invariant() failure\n\n"} {"t":{"$date":"2021-01-13T11:50:17.985+00:00"},"s":"F", "c":"CONTROL", "id":4757800, "ctx":"TopologyVersionObserver","msg":"Writing fatal message","attr":{"message":"Got signal: 6 (Aborted).\n"}} {"t":{"$date":"2021-01-13T11:50:18.081+00:00"},"s":"I", "c":"CONTROL", "id":31431, "ctx":"TopologyVersionObserver","msg":"BACKTRACE: {bt}","attr":{"bt":{"backtrace":[{"a":"55606F60F921","b":"55606C92C000","o":"2CE3921","s":"_ZN5mongo18stack_trace_detail12_GLOBAL__N_119printStackTraceImplERKNS1_7OptionsEPNS_14StackTraceSinkE.constprop.606","s+":"1E1"},{"a":"55606F610F59","b":"55606C92C000","o":"2CE4F59","s":"_ZN5mongo15printStackTraceEv","s+":"29"},{"a":"55606F60E7B6","b":"55606C92C000","o":"2CE27B6","s":"_ZN5mongo12_GLOBAL__N_116abruptQuitActionEiP9siginfo_tPv","s+":"66"},{"a":"7EFF61C048A0","b":"7EFF61BF2000","o":"128A0","s":"funlockfile","s+":"50"},{"a":"7EFF6183FF47","b":"7EFF61801000","o":"3EF47","s":"gsignal","s+":"C7"},{"a":"7EFF618418B1","b":"7EFF61801000","o":"408B1","s":"abort","s+":"141"},{"a":"55606D840D8F","b":"55606C92C000","o":"F14D8F","s":"_ZN5mongo15invariantFailedEPKcS1_j","s+":"12C"},{"a":"55606D814458","b":"55606C92C000","o":"EE8458","s":"_ZN5mongo6Client19setOperationContextEPNS_16OperationContextE.cold.135","s+":"18"},{"a":"55606F4D1069","b":"55606C92C000","o":"2BA5069","s":"_ZN5mongo14ServiceContext20makeOperationContextEPNS_6ClientE","s+":"129"},{"a":"55606F4C5B17","b":"55606C92C000","o":"2B99B17","s":"_ZN5mongo6Client20makeOperationContextEv","s+":"27"},{"a":"55606DD474A1","b":"55606C92C000","o":"141B4A1","s":"_ZN5mongo4repl23TopologyVersionObserver17_workerThreadBodyEv","s+":"251"},{"a":"55606DD47B4C","b":"55606C92C000","o":"141BB4C","s":"_ZNSt6thread11_State_implINS_8_InvokerISt5tupleIJZN5mongo4stdx6threadC4IZNS3_4repl23TopologyVersionObserver4initEPNS3_14ServiceContextEPNS7_22ReplicationCoordinatorEEUlvE0_JELi0EEET_DpOT0_EUlvE_EEEEE6_M_runEv","s+":"5C"},{"a":"55606F7B547F","b":"55606C92C000","o":"2E8947F","s":"execute_native_thread_routine","s+":"F"},{"a":"7EFF61BF96DB","b":"7EFF61BF2000","o":"76DB","s":"start_thread","s+":"DB"},{"a":"7EFF61922A3F","b":"7EFF61801000","o":"121A3F","s":"clone","s+":"3F"}],"processInfo":{"mongodbVersion":"4.4.2","gitVersion":"15e73dc5738d2278b688f8929aee605fe4279b0e","compiledModules":[],"uname":{"sysname":"Linux","release":"5.0.0-1036-azure","version":"#38-Ubuntu SMP Sun Mar 22 21:27:21 UTC 2020","machine":"x86_64"},"somap":[{"b":"55606C92C000","elfType":3,"buildId":"D18F657A1E06C333C2AEE534E3047044B0653DBF"},{"b":"7EFF61BF2000","path":"/lib/x86_64-linux-gnu/libpthread.so.0","elfType":3,"buildId":"BC3C06107774266C5F7DB3F1F380A3DA68AF90FA"},{"b":"7EFF61801000","path":"/lib/x86_64-linux-gnu/libc.so.6","elfType":3,"buildId":"D3CF764B2F97AC3EFE366DDD07AD902FB6928FD7"}]}}}} {"t":{"$date":"2021-01-13T11:50:18.081+00:00"},"s":"I", "c":"CONTROL", "id":31427, "ctx":"TopologyVersionObserver","msg":" Frame: {frame}","attr":{"frame":{"a":"55606F60F921","b":"55606C92C000","o":"2CE3921","s":"_ZN5mongo18stack_trace_detail12_GLOBAL__N_119printStackTraceImplERKNS1_7OptionsEPNS_14StackTraceSinkE.constprop.606","s+":"1E1"}}} {"t":{"$date":"2021-01-13T11:50:18.081+00:00"},"s":"I", "c":"CONTROL", "id":31427, "ctx":"TopologyVersionObserver","msg":" Frame: {frame}","attr":{"frame":{"a":"55606F610F59","b":"55606C92C000","o":"2CE4F59","s":"_ZN5mongo15printStackTraceEv","s+":"29"}}} {"t":{"$date":"2021-01-13T11:50:18.081+00:00"},"s":"I", "c":"CONTROL", "id":31427, "ctx":"TopologyVersionObserver","msg":" Frame: {frame}","attr":{"frame":{"a":"55606F60E7B6","b":"55606C92C000","o":"2CE27B6","s":"_ZN5mongo12_GLOBAL__N_116abruptQuitActionEiP9siginfo_tPv","s+":"66"}}} {"t":{"$date":"2021-01-13T11:50:18.081+00:00"},"s":"I", "c":"CONTROL", "id":31427, "ctx":"TopologyVersionObserver","msg":" Frame: {frame}","attr":{"frame":{"a":"7EFF61C048A0","b":"7EFF61BF2000","o":"128A0","s":"funlockfile","s+":"50"}}} {"t":{"$date":"2021-01-13T11:50:18.081+00:00"},"s":"I", "c":"CONTROL", "id":31427, "ctx":"TopologyVersionObserver","msg":" Frame: {frame}","attr":{"frame":{"a":"7EFF6183FF47","b":"7EFF61801000","o":"3EF47","s":"gsignal","s+":"C7"}}} {"t":{"$date":"2021-01-13T11:50:18.081+00:00"},"s":"I", "c":"CONTROL", "id":31427, "ctx":"TopologyVersionObserver","msg":" Frame: {frame}","attr":{"frame":{"a":"7EFF618418B1","b":"7EFF61801000","o":"408B1","s":"abort","s+":"141"}}} {"t":{"$date":"2021-01-13T11:50:18.081+00:00"},"s":"I", "c":"CONTROL", "id":31427, "ctx":"TopologyVersionObserver","msg":" Frame: {frame}","attr":{"frame":{"a":"55606D840D8F","b":"55606C92C000","o":"F14D8F","s":"_ZN5mongo15invariantFailedEPKcS1_j","s+":"12C"}}} {"t":{"$date":"2021-01-13T11:50:18.081+00:00"},"s":"I", "c":"CONTROL", "id":31427, "ctx":"TopologyVersionObserver","msg":" Frame: {frame}","attr":{"frame":{"a":"55606D814458","b":"55606C92C000","o":"EE8458","s":"_ZN5mongo6Client19setOperationContextEPNS_16OperationContextE.cold.135","s+":"18"}}} {"t":{"$date":"2021-01-13T11:50:18.082+00:00"},"s":"I", "c":"CONTROL", "id":31427, "ctx":"TopologyVersionObserver","msg":" Frame: {frame}","attr":{"frame":{"a":"55606F4D1069","b":"55606C92C000","o":"2BA5069","s":"_ZN5mongo14ServiceContext20makeOperationContextEPNS_6ClientE","s+":"129"}}} {"t":{"$date":"2021-01-13T11:50:18.082+00:00"},"s":"I", "c":"CONTROL", "id":31427, "ctx":"TopologyVersionObserver","msg":" Frame: {frame}","attr":{"frame":{"a":"55606F4C5B17","b":"55606C92C000","o":"2B99B17","s":"_ZN5mongo6Client20makeOperationContextEv","s+":"27"}}} {"t":{"$date":"2021-01-13T11:50:18.082+00:00"},"s":"I", "c":"CONTROL", "id":31427, "ctx":"TopologyVersionObserver","msg":" Frame: {frame}","attr":{"frame":{"a":"55606DD474A1","b":"55606C92C000","o":"141B4A1","s":"_ZN5mongo4repl23TopologyVersionObserver17_workerThreadBodyEv","s+":"251"}}} {"t":{"$date":"2021-01-13T11:50:18.082+00:00"},"s":"I", "c":"CONTROL", "id":31427, "ctx":"TopologyVersionObserver","msg":" Frame: {frame}","attr":{"frame":{"a":"55606DD47B4C","b":"55606C92C000","o":"141BB4C","s":"_ZNSt6thread11_State_implINS_8_InvokerISt5tupleIJZN5mongo4stdx6threadC4IZNS3_4repl23TopologyVersionObserver4initEPNS3_14ServiceContextEPNS7_22ReplicationCoordinatorEEUlvE0_JELi0EEET_DpOT0_EUlvE_EEEEE6_M_runEv","s+":"5C"}}} {"t":{"$date":"2021-01-13T11:50:18.082+00:00"},"s":"I", "c":"CONTROL", "id":31427, "ctx":"TopologyVersionObserver","msg":" Frame: {frame}","attr":{"frame":{"a":"55606F7B547F","b":"55606C92C000","o":"2E8947F","s":"execute_native_thread_routine","s+":"F"}}} {"t":{"$date":"2021-01-13T11:50:18.082+00:00"},"s":"I", "c":"CONTROL", "id":31427, "ctx":"TopologyVersionObserver","msg":" Frame: {frame}","attr":{"frame":{"a":"7EFF61BF96DB","b":"7EFF61BF2000","o":"76DB","s":"start_thread","s+":"DB"}}} {"t":{"$date":"2021-01-13T11:50:18.082+00:00"},"s":"I", "c":"CONTROL", "id":31427, "ctx":"TopologyVersionObserver","msg":" Frame: {frame}","attr":{"frame":{"a":"7EFF61922A3F","b":"7EFF61801000","o":"121A3F","s":"clone","s+":"3F"}}} {"t":{"$date":"2021-01-13T12:17:39.998+00:00"},"s":"I", "c":"CONTROL", "id":20698, "ctx":"main","msg":"***** SERVER RESTARTED *****"} {"t":{"$date":"2021-01-13T12:17:40.002+00:00"},"s":"I", "c":"CONTROL", "id":23285, "ctx":"main","msg":"Automatically disabling TLS 1.0, to force-enable TLS 1.0 specify --sslDisabledProtocols 'none'"} {"t":{"$date":"2021-01-13T12:17:40.068+00:00"},"s":"W", "c":"ASIO", "id":22601, "ctx":"main","msg":"No TransportLayer configured during NetworkInterface startup"}
Issues likely related I have found already reported are:
Attached relevant part of the mongod.log.
After several minutes the mongod process got killed, after restart all data bearing node we recover the cluster. Arbiter was not affected.
The featureCompatibilityVersion was not updated to 4.4.
db.adminCommand( { getParameter: 1, featureCompatibilityVersion: 1 } ).featureCompatibilityVersion.version
// 4.2
More info:
- Replica set is running 4.4.2 binaries
- Replica set is composed by
- 3 Data bearing node (1 with priority 0 and votes 0, 2 with priority 1 and votes 1)
- 1 Arbiter
- OS: Linux Ubuntu 18.0 LTS
- Cloud: Azure
What should be the recommendation in this case to overcome the issue and being able to finalize the upgrade?
- Can we try to run the FCV command in a secondary only first?
- Is it known or expected such behavior after running the FCV command?
- duplicates
-
SERVER-52735 mongodb crash with "Invariant failure" opCtx != nullptr && _opCtx == nullptr
- Closed
- is related to
-
SERVER-53566 Investigate and reproduce "opCtx != nullptr && _opCtx == nullptr" invariant
- Closed
- related to
-
SERVER-52735 mongodb crash with "Invariant failure" opCtx != nullptr && _opCtx == nullptr
- Closed