Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-88725

Backlogged $merge causes 30min stop

    • Atlas Streams
    • Fully Compatible
    • ALL
    • Sprint 46, Sprint 47, Sprint 48, Sprint 49, Sprint 50, Sprint 52

      This is the customer's pipeline:

       

      [{"$source": {"connectionName": "KafkaConfluent","topic": "OutputTopic"}},{"$merge": {"into": {"connectionName": "LyricsCluster","db": "streamingvectors","coll": "lyrics"},"on": "_id","whenMatched": "merge","whenNotMatched": "insert"}}] 

       

      Root cause (see this splunk):

      1. When the customer issued the stop, there were roughly 4,163,319,725 bytes input, but only 2,670,960,838 bytes output. The sink was backlogged.
      2. As part of the stop, we start writing a checkpoint.
      3. The $source processes the checkpoint at 3/28/24
        5:03:37.180 PM
      4. The sink doesn't finish processing the checkpoint until 3/28/24
        5:44:11.283 PM

      One related issue is-- why did the backlog get up to 2GB? Our code should prevent that, limiting the backlog in the sink to ~100MB.

      ==== Customer report ====

      It has failed again with the same error in the new stream processing cluster I have created.
       
      { id: '6603d8c486a1abd293b773c5', name: 'lyrics_destination_cluster', lastModified: ISODate('2024-03-27T08:28:52.546Z'), state: 'STARTED', errorMsg: '', workers: [ 'worker-56b79c874d-9wjr2' ], pipeline: [ { '$source':

      { connectionName: 'KafkaConfluent', topic: 'OutputTopic' }

      }, { '$merge': { into:

      { connectionName: 'LyricsCluster', db: 'streamingvectors', coll: 'lyrics' }

      , on: '_id', whenMatched: 'merge', whenNotMatched: 'insert' } } ], lastStateChange: ISODate('2024-03-28T17:03:31.206Z') },
      The processor subscribed to the Kafka topic has stopped working. It still shows a STARTED state and I can’t stop it

            Assignee:
            sandeep.dhoot@mongodb.com Sandeep Dhoot
            Reporter:
            matthew.normyle@mongodb.com Matthew Normyle
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: