Change streams may be subject to spurious "CappedPositionLost" when resuming

XMLWordPrintableJSON

    • Type: Bug
    • Resolution: Gone away
    • Priority: Major - P3
    • None
    • Affects Version/s: None
    • None
    • ALL
    • Query 2020-08-24, Query 2020-09-07
    • None
    • 3
    • None
    • None
    • None
    • None
    • None
    • None

      Our testing infrastructure uncovered a rare case where this might happen, detailed in SERVER-49690. When a change stream is resuming, it may encounter this error. As far as I know this has never been observed, but I see no reason it couldn't happen. I would recommend looking into whether we can reproduce this. If so, I think we should do one of the following:
      1) Disabling yielding when doing the oplog check upon resume during the change stream
      2) Adding a similar retry loop within the change stream
      3) Ensuring drivers will retry this error

      During SERVER-49690 I looked into option #1 but the patch quickly exploded. I'll attach my WIP but it certainly won't compile and doesn't plumb the yield policy far enough to fix the issue.

            Assignee:
            Bernard Gorman
            Reporter:
            Charlie Swanson
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              Resolved: