-
Type: Bug
-
Resolution: Unresolved
-
Priority: Major - P3
-
None
-
Affects Version/s: 6.0.5
-
Component/s: None
-
Query Execution
-
ALL
-
Sharding EMEA 2023-05-01, Sharding EMEA 2023-05-15
Mongosync in the above log receives a CursorNotFound error. The cursor in question, though—5131255404875629212—appears to be internal to mongos and mongod.
The first occurrence of 5131255404875629212 in the logs is on shard1 node2 (s1:n2) and indicates a slow query.
The next occurrence, on shard0 node 0, indicates that the shard0 node that's answering mongosync's query is "not in primary or recovering state". Note that this isn't cursor 5131255404875629212, but it might be related.
The next occurrence after that, on shard0 node 1, is where we start seeing CursorNotFound.
The line after that is where mongos indicates CursorNotFound, and the line thereafter is where mongosync reports it.
There appear to be a couple issues here:
1. Whatever is causing the CursorNotFound
2. It seems like mongos shouldn't report CursorNotFound to mongosync here since (if I'm understanding the logs correctly) that is its own cursor for querying shard-rs1, not something that mongosync knows or (directly) cares about.
The log indicates an election in shard-rs0 as well as a stepUp in shard-rs1. Maybe these are related?
A potential workaround for mongosync is to treat CursorNotFound as a transient, retryable error.