-
Type: Bug
-
Resolution: Fixed
-
Priority: Major - P3
-
Affects Version/s: None
-
Component/s: Replication, Sharding
-
Fully Compatible
-
ALL
-
Sharding 2017-10-02
-
0
Imagine you have a node whose system clock is out of sync and ahead of the rest of the replica set. That node gets elected primary and takes a w:1 write and returns an operationTime of 500. Before that primary ever commits any writes in its term, however, it crashes and a new primary is elected. The new primary is only at clusterTime 100. If the client then sends afterClusterTime to the new primary, it will get an error that "readConcern afterClusterTime must not be greater than clusterTime value".
Instead of getting an error, the new primary should perform a no-op write to advance its cluster time to 500. In order to do that, unsharded replica sets will need to sign and propagate a cluster time like shard servers do.
Implementation details:
The implementation will use the existing machinery for generating and looking up cluster keys in a replica set.
As the keys are stored in the admin.system.keys database it will not require new database creation in the replica set.
1. Refactor KeyCollectionManager:
a. remove KeysCollectionManagerDirect and KeysCollectionManagerZero. It does not look that they are going to be used.
b. move code from KeysCollectionManagerSharded into KeysCollectionManagerImpl - therefore we keep the interface that may be needed.
c. Add an interface KeysCollectionClient
d. Add KeysCollectionClientSharded that uses ShardingCatalogClient and KeysCollectionClientDirect that uses DBDirectClient to get and insert keys.
2. in db/db.cpp: construct KeysCollectionManager with KeysCollectionClientDirect when mongod is being started with a --replSet flag
if (replSettings.usingReplSets()) {
auto keyManager = std::make_shared<KeysCollectionManagerImpl>(
std::make_unique<KeysCollectionClientDirect>(), kKeyManagerPurposeString, Seconds(KeysRotationIntervalSec));
...
}
3. enable/disable keys generation when the mongod node becomes primary/secondary
It is already done at ReplicationCoordinatorExternalStateImpl::_shardingOnTransitionToPrimaryHook and ReplicationCoordinatorExternalStateImpl::shardingOnStepDownHook - the same place where its currently works for the config server.
The check should make sure that the new primary is not part of any sharded cluster.
4. When a replica set becomes a part of the shardedCluster its nodes must change the client to KeysCollectionClientSharded.
Need to clear the cache to avoid signing the data with the replica set keys that are being discarded.
As it comes to the clients that have clusterTime signed by replica set the protocol must ensure that the clusterTime they got is signed by the config server. It must be part of the addShard call. This imply that mongos need to refresh its keys right after addShard was processed by config server.
5. We should not invariant on the non-consistent user input: remove the invariant - in production it should log an error and return an error.
- causes
-
SERVER-31658 replsets/auth2.js should wait for SECONDARY instead of RECOVERING when 2 nodes down
- Closed
- depends on
-
SERVER-31202 Mongo shell client needs to support causal consistency.
- Closed
-
SERVER-31187 Refactor ShardLocal
- Closed
-
SERVER-31270 Add an option to ReplicaSetTest to wait for keys
- Closed
- is duplicated by
-
SERVER-30909 readConcern afterClusterTime not working in a non-sharded replica set
- Closed
- related to
-
SERVER-33812 First initial sync oplog read batch fetched may be empty; do not treat as an error.
- Closed
-
SERVER-30927 Use readConcern afterClusterTime for initsync oplog queries
- Closed