Loading...

XML

Word

Printable

JSON

Type: Task
Resolution: Won't Do
Priority: Major - P3
Fix Version/s: None
Affects Version/s: None
Component/s: None
Labels:
- sharding-nyc-subteam3

Assigned Teams:

Sharding NYC
Sprint:
Sharding NYC 2023-08-21, Sharding NYC 2023-09-04
Story Points:
3
Confidence Status:
None
Work Order:
3

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name:
None
Goal Link:
None

The LogicalSessionCache refresher and reaper currently have the step to check that the config.system.sessions collection exists (here and here) which under the hood performs a force refresh of the routing for the collection. On a secondary shardsvr mongod, each routing info refresh involves making the primary refresh by running a _flushRoutingTableCacheUpdate command against the primary and waiting for opTime that the command returns. From code inspection, the wait does not have a timeout. So the opTime wait time after each _flushRoutingTableCacheUpdate command is dependent on the replication lag. So when the lag is large, the refresh will take proportionally long to complete (HELP-48060) and can consequently occur less frequently than scheduled. It is unclear why such a force refresh is necessary, i.e. why we don't just let refresher or reaper itself as a client retry the upserts/delete/find commands later if it gets a StaleConfig error.

Assignee:: Cheahuychou Mao
Reporter:: Cheahuychou Mao
Participants:: Cheahuychou Mao, Jason Zhang
Votes:: 0 Vote for this issue
Watchers:: 6 Start watching this issue

Created:: Jul 25 2023 04:19:55 PM UTC
Updated:: Sep 07 2023 05:31:57 PM UTC
Resolved:: Sep 05 2023 01:56:39 AM UTC
Confidence Status Last Update:: 10/Aug/23 2:16 AM

Details

Description

Attachments

Activity

People

Dates