Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-100338

Make server errors explicitly indicate indeterminate state

    • Type: Icon: Improvement Improvement
    • Resolution: Won't Fix
    • Priority: Icon: Trivial - P5 Trivial - P5
    • None
    • Affects Version/s: None
    • Component/s: None
    • None
    • Server Programmability

      Background

      In a recent test run for mongosync’s 8.0 support, a `moveCollection` command returned InterruptedAtShutdown, which my code ignored. Mongosync then ran, after which the test ran a consistency checker. That checker found a missing document on the destination.

      What apparently happened was: the `moveCollection` actually succeeded despite the `ok: 0` response. That success included changing the collection’s UUID; that UUID change happened after mongosync had done its initialization (which records collection UUIDs). Thus, when mongosync fetched the collection’s documents—which request includes the collection’s pre-`moveCollection` UUID—it got a response from the server that indicated the collection UUID no longer existed. Thus, my test migration lost a document.

      (This also happened because mongosync was ignoring unrecognized DDL events; REP-5614 is remedying that, so the `reshardCollection` event that fires on successful `moveCollection` will crash mongosync.)

      Proposal

      The server’s error responses should indicate explicitly whether an error means the system state is indeterminate:

      1. Ideally, clients/drivers could opt-in to some sort of ok: "?" response that discourages interpretation of errors like InterruptedAtShutdown as failures. (For clients that don’t indicate support for this response in their requests, the server would still send ok: 0.

      2. Failing that—or perhaps in addition to it—the server could send a supplementary error label like SystemStateUnknown.

      3. The error message should be prefixed with System state unknown:. (I don’t know if this would violate stability guarantees.)

      4. For existing versions (where a potential backport isn’t received), we should publish a list of error codes that indicate an indeterminate system state. Or, if it’s not as simple as checking membership in a static set of codes, we should publish whatever algorithm will guide users to a correct interpretation of the responses. (NB: Per max.hirschhorn@mongodb.com, this function is more or less that set of error codes.)

            Assignee:
            Unassigned Unassigned
            Reporter:
            felipe.gasper@mongodb.com Felipe Gasper
            Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

              Created:
              Updated:
              Resolved: