Goal: investigate if the scenario outlined below is possible and determine a fix.
Scenario:
Suppose shard0 has a primary, secondary0, and secondary1.
- Each time mongos tries to perform 'count' with hedged reads, secondary0 never gets the chance to set the in memory database version because it either gets killedOp'd or timed out of maxTimeMS before it can be set successfully after refreshing.
- Over the NetworkInterfaceTL, mongos tries to route the next 'count' command to secondary0. Secondary0 has no known databaseVersion, so onDbVersionMismatchNoExcept gets called. A refresh is prompted and errors with maxTimeMS expiration error.
- Since the maxTimeMS error is ignored, the original "no known dbVersion" error propagates back to the NetworkInterfaceTL. There, since the error reported is not maxTimeMS, the finish line is triggered. Secondary0 wins the race, but the 'count' fails with "don't know dbVersion."
In this scenario, we believe secondary0 would be getting killed here.
- is related to
-
SERVER-46187 MongoS doesn't pass maxTimeMS to shards for write commands
- Closed
- related to
-
SERVER-48264 ShardServerCatalogCacheLoader doesn't handle threadpool shutdown
- Closed