Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-59350

Invariant failure exception

    • Type: Icon: Bug Bug
    • Resolution: Done
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • None
    • ALL

      We have 10 shards cluster in our production environment (3 x MongoDB replica set (primary-secondary-secondary) for each shard that is to say 30 x MongoDB physical hosts in total), plus 3 x ConfigServer hosts (1 RS) plus 3 x Mongos.

      All of the MongoDB services have 4.4.4 version.

      15.07 we found half of our 30 mongo services shut down due to some strange exception:

      {"t":{"$date":"2021-08-15T10:21:01.620+03:00"},"s":"F""c":"-",        "id":23079,   "ctx":"waitForMajority","msg":"Invariant failure","attr":{"expr":"opCtx != nullptr && _opCtx == nullptr","file":"src/mongo/db/client.cpp","line":126}}
      

      So I've made my own investigation and found some similar cases - https://jira.mongodb.org/browse/SERVER-52735

      Next, what I've found is an official recommendation not to use MongoDB 4.4.5 in any variation according to https://docs.mongodb.com/manual/release-notes/4.4-changelog/

      MongoDB version 4.4.5 is not recommended for production use due to a critical issue, WT-7426. The issue is fixed in version 4.4.6.

      https://jira.mongodb.org/browse/WT-7426 in turn has some resemblant cases linked.

      So what is your final recommendation for alleviation of InvariantFailure exceptions?
      What version of MongoDB do we have to use in our production circuit right now?

      If you wanted to see some additional logging or diagnostic data/shard configuration I can upload it for your further investigation.

            Assignee:
            eric.sedor@mongodb.com Eric Sedor
            Reporter:
            haltandcatchfire91@gmail.com Basil Markov
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: