Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-57986

Report metrics for lag monitoring in change streams output

    • Type: Icon: Improvement Improvement
    • Resolution: Won't Do
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: Change streams
    • Query Execution
    • Query Execution 2021-07-12, Query Execution 2021-07-26

      Applications that use change streams can fall behind the source cluster if event processing does not keep up with the rate of oplog generated on the source. In these cases, the events received by the service may have been generated on the cluster seconds/minutes/hours before they are retrieved by the application, and the change stream may even fall off the source oplog and cannot resume syncing from the point of failure.

      The user may not be aware of the lag growing for some time (sometimes too late) – if change streams event output reported details about the server that could be used to calculate and monitor the change streams lag, this could help users catch and address symptoms earlier.

      For example, can we configure change streams output to include lastCommittedOpTime in addition to the $clusterTime metric that is already reported? The difference of the two could be used to calculate the lag from the source cluster.

      Currently customers can calculate lag in a couple ways – for example:

      • compare the event $clusterTime to the time when the event was receive in the application
      • retrieve lastCommittedOptime in a parallel request, and compare that to the change streams output

      But each of these approaches requires extra configuration and inherently incorporates some imprecision – they are not a reflection of the lag at the particular point when the change stream event was returned by the server.

            Assignee:
            backlog-query-execution [DO NOT USE] Backlog - Query Execution
            Reporter:
            marie.atterbury@mongodb.com Marie Atterbury
            Votes:
            0 Vote for this issue
            Watchers:
            10 Start watching this issue

              Created:
              Updated:
              Resolved: