-
Type: Task
-
Resolution: Won't Do
-
Priority: Major - P3
-
None
-
Affects Version/s: None
-
Component/s: None
-
Sharding NYC
-
Sharding NYC 2023-08-21, Sharding NYC 2023-09-04
-
3
The LogicalSessionCache refresher and reaper currently have the step to check that the config.system.sessions collection exists (here and here) which under the hood performs a force refresh of the routing for the collection. On a secondary shardsvr mongod, each routing info refresh involves making the primary refresh by running a _flushRoutingTableCacheUpdate command against the primary and waiting for opTime that the command returns. From code inspection, the wait does not have a timeout. So the opTime wait time after each _flushRoutingTableCacheUpdate command is dependent on the replication lag. So when the lag is large, the refresh will take proportionally long to complete (HELP-48060) and can consequently occur less frequently than scheduled. It is unclear why such a force refresh is necessary, i.e. why we don't just let refresher or reaper itself as a client retry the upserts/delete/find commands later if it gets a StaleConfig error.