Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-88845

concurrent start+stop causes mongostream crash due to invariant "line":983,"expr":"!status.isOK()","file":"src/mongo/util/future.h"

    • Type: Icon: Bug Bug
    • Resolution: Fixed
    • Priority: Icon: Major - P3 Major - P3
    • 8.1.0-rc0
    • Affects Version/s: None
    • Component/s: None
    • Atlas Streams
    • Fully Compatible
    • ALL
    • Sprint 46

      A stop request during an ongoing start flow can cause an invariant:

      "msg":"Invariant failure","svc":"-","attr":{"line":983,"expr":"!status.isOK()","file":"src/mongo/util/future.h"

              } catch (const SPException& e) {
                  // This catch block gets hit with an SPException that has an OK status. 
                  // This leads to the invariant when we call setError
      
                  LOGV2_WARNING(75900,
                                "encountered stream processor exception, exiting runLoop(): {error}",
                                "context"_attr = _context,
                                "errorCode"_attr = e.code(),
                                "reason"_attr = e.reason(),
                                "unsafeErrorMessage"_attr = e.unsafeReason(),
                                "error"_attr = e.what());
                  _promise.setError(e.toStatus());
                  promiseFulfilled = true;
              } catch (const DBException& e) { 

      Example 1 (staging):

      https://splunk.corp.mongodb.com/en-US/app/streams/search?earliest=-4h%40m&latest=now&q=s[…]ype=events&display.events.type=raw&sid=1712030881.9694837

      4:34:59.650 AM — Agent starting stream processor
      4:34:59.651 AM — About to start stream processor

      // SPM sends an errant stop request due to bug in heartbeat rejection logic.
      4:35:01.523 AM — Stopping stream processor
      4:35:01.568 AM — started operator dag
      4:35:01.568 AM — encountered stream processor exception, exiting runLoop(): {error}
      4:35:01.568 AM — Invariant failure

      Example 2 (prod):

      8:10:17.832 AM – Starting stream processor

      // This is the k8s shutdown flow. A side question is, why is it happening now?

      8:10:18.745 AM – Stopping all streamProcessors

      8:10:18.745 AM – Stopping stream processor

      8:10:18.833 AM – encountered stream processor exception, exiting runLoop(): {error}

      8:10:18.833 AM – expr: !status.isOK(), file: src/mongo/util/future.h, line: 983

      https://splunk.corp.mongodb.com/en-US/app/streams/search?earliest=1712045408.833&latest=1712045428.834&q=search%20index%3Dmhouse%20(66043c5a834b6c388c081dd0%20OR%20%22Stopping%20all%20streamProcessors%22)%20host%3Dstreams-spp-56b79c874d-znrps%20source%3Dstreams-spp%20c%3DSTREAMS%20((attr.errorCode%3D0%20AND%20exception)%20OR%20%22Stopping%22%20OR%20%22Starting%22)&display.page.search.mode=smart&dispatch.sample_ratio=1&display.page.search.tab=events&display.general.type=events&sid=1712067177.9766529

      Example 3 (prod):

      https://splunk.corp.mongodb.com/en-US/app/streams/search?earliest=1711580331.209&latest=1711580351.21&q=search%20index%3Dmhouse%20source%3Dstreams-spp%2065f418f9e00ced3c072f9e58%20host%3Dstreams-spp-56b79c874d-dnsng%20(c%3DSTREAMS%20OR%20%22Agent%20starting%20stream%20processor%22)&display.page.search.mode=smart&dispatch.sam 

       

            Assignee:
            matthew.normyle@mongodb.com Matthew Normyle
            Reporter:
            matthew.normyle@mongodb.com Matthew Normyle
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated:
              Resolved: