-
Type: Bug
-
Resolution: Fixed
-
Priority: Major - P3
-
Affects Version/s: None
-
Component/s: Sharding
-
Fully Compatible
-
ALL
-
v4.0, v3.6
-
Sharding 2018-06-18
-
(copied to CRM)
-
60
When computing operationTime for a response, the "client operation time" at the start of a command is compared to the time at the end. There are some error contexts where there is no start operation time, so if the start time is a null logical time, the latest in-memory cluster time is returned.
The start time is computed by getting the opTime of the last op on the current client. If a client hasn't been used before though, this can be null, which results in a null start time and the returned operationTime will be the latest in-memory time, even for successful operations.
This does not violate causal consistency, but it can create problems in our test infrastructure when the no-op writer is off, because it's possible the latest in-memory clusterTime is a higher than the latest opTime in a replica set primary's oplog. In particular, when secondaries force metadata refreshes on the primary, they use the operationTime of _forceRouterConfigUpdate to do a waitUntilOpTime (which bypasses forced no-op writes) and can hang forever.
Possible solutions would be:
- Distinguishing between not having a start time at all and running on a new client (possibly with a boost::optional or new sentinel logical time value)
- Changing computeOperationTime to return the lastAppliedOpTime instead of the latest in-memory time when there is no start time
- Using the lastAppliedOpTime as the start client operation time if the client has no last op
This is new behavior introduced by the refactoring in SERVER-34843. It also exacerbates SERVER-31887, since successful requests can receive an operationTime not in the primary's oplog.
- is caused by
-
SERVER-34843 Mongod can return operationTime greater than $clusterTime
- Closed
- is duplicated by
-
SERVER-35156 secondary reads return cluster time as the operation time
- Closed
- related to
-
SERVER-31887 clusterTime advanced on primary without anything being written to oplog
- Closed