There is a race in the way mongod computes $clusterTime and operationTime. Before returning a response, mongod gets the latest cluster time from the LogicalClock and adds it to the request as $clusterTime (in appendReplyMetadata, called for successful commands here). Then, if a non null $clusterTime was computed, operationTime is computed by asking for the latest opTime on the client for writes, or the opTime of the last applied or committed write, for local and majority reads respectively. There is no synchronization that prevents the last applied or committed opTimes from advancing beyond the previously computed $clusterTime, allowing operationTime to be larger than $clusterTime in the response.
A straightforward way to fix this could be to just compute operationTime before $clusterTime, because $clusterTime is always allowed to be greater than operationTime.
- causes
-
SERVER-35377 Operations on new clients get latest in-memory clusterTime as operationTime
- Closed
- is caused by
-
SERVER-33585 Do not return $clusterTime when no keys are available
- Closed
- is duplicated by
-
SERVER-33177 Inconsistent handling of operation time in mongod error response
- Closed