Uploaded image for project: 'Drivers'
  1. Drivers
  2. DRIVERS-1904

Handle invalid $clusterTime documents when gossiping cluster time

    • Type: Icon: Improvement Improvement
    • Resolution: Unresolved
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Component/s: Sessions
    • Needed

      Summary

      MongoDB 3.6+ replica sets and sharded clusters return a $clusterTime document on all operations. Drivers are required to send the newest seen $clusterTime document on all operations (called gossiping cluster time).

      E.g. $clusterTime document:

      {
          "$clusterTime": {
              "clusterTime": {"$timestamp": {"t": "1629939437","i": "1"}},
              "signature": {
                  "hash": {"$binary": {"base64": "XXXXXXXXXXXXXXXXXXXXXXXXXXX=","subType": "00"}},
                  "keyId": {"$numberLong": "6952215213588348932"}
              }
          }
      }
      

      A server may respond with a $clusterTime document containing a "dummy signed" cluster time that specifies keyId: 0. If that happens, subsequent operations that gossip the new $clusterTime document with keyId: 0 may get a KeyNotFound server error instead of the expected operation result. New operations may continue to fail until a server response includes a newer $clusterTime document (i.e. with a greater timestamp) containing a valid signature with valid keyId.

      The proposed improved behavior when the server responds with a KeyNotFound error (code 211):

      • Drop the Client's stored $clusterTime document (that we're assuming has a "dummy signed" cluster time with keyId: 0).
      • Invalidate the current implicit session.
      • Retry the operation with a new implicit session.

      Motivation

      Who is the affected end user?

      Users who encounter the KeyNotFound server error caused by receiving an invalid $clusterTime document from the server while attempting to run operations. See related tickets

      How does this affect the end user?

      Users who encounter the KeyNotFound server error may encounter up to a 100% operation error rate until a server responds with a newer, valid $clusterTime document. The cluster time is only advanced on a write operation, so the client's $clusterTime document will be updated as soon as a write happens and the client performs another operation.

      How likely is it that this problem or use case will occur?

      Conditions required:

      • The server must respond with a "dummy signed" $clusterTime document with keyId: 0 in a replicaset or cluster that contains mixed authentication requirements (e.g. if --transitionToAuth is enabled on some but not all nodes). Unknown frequency, but happens enough to get reported in numerous tickets.
      • The driver must store the new, invalid $clusterTime document on the Client. The driver stores new $clusterTime documents on the Client whenever the document has a greater timestamp than the one it currently has.
      • Other servers in the replicaset/cluster must have a $clusterTime timestamp that is lower than the one in the $clusterTime document stored on the Client.

      If those conditions happen, when the driver sends any operation to a server that has a lower $clusterTime timestamp, the server will respond with a KeyNotFound error.

      Simplified example of what's happening:

      1. Create new driver Client. Client's cluster time doc is empty.
      2. Send an Insert operation to the primary with no auth enabled. Server responds, including a "dummy signed" $clusterTime document with keyId: 0. Driver stores the new cluster time doc on the Client.
      3. Send a Find operation to a secondary with --transitionToAuth enabled and using an authenticated user. Client gossips cluster time by attaching the stored cluster time doc (with the "dummy signed" keyId).
      4. If the secondary has a lower local $clusterTime timestamp, return a KeyNotFound error.

      If the problem does occur, what are the consequences and how severe are they?

      Some percentage of operations, up to 100%, will fail until the driver receives a new, valid $clusterTime document from a server response (until another write happens that advances the cluster time). The percentage of operations that fail depends on the percentage of operations sent to servers that have a $clusterTime timestamp lower than the one sent on the operation.

      For example, if the driver is sending all read/write operations to the primary server in a replicaset, it's unlikely to impossible that the Client has a $clusterTime document that is newer than the one on the primary server because all $clusterTime documents are coming from the primary server. However, if the driver is configured to send writes to a primary and reads to a secondary, a $clusterTime document received from a primary could have a greater timestamp than the current $clusterTime document on the secondary. Subsequent read operations sent to the secondary would include the newer $clusterTime document and could cause a KeyNotFound error if the keyId on the $clusterTime document is invalid.

      Is this issue urgent?

      No.

      Is this ticket required by a downstream team?

      No.

      Is this ticket only for tests?

      No.

            Assignee:
            Unassigned Unassigned
            Reporter:
            matt.dale@mongodb.com Matt Dale
            Votes:
            3 Vote for this issue
            Watchers:
            10 Start watching this issue

              Created:
              Updated: