-
Type: Task
-
Resolution: Unresolved
-
Priority: Major - P3
-
None
-
Affects Version/s: None
-
Component/s: None
-
Replication
Currently for extra index keys check, when we do the batching/reverse lookup phase, we don’t acquire a catalog snapshot with any particular timestamp (we read at the latest noOverlap point). Then when we do the hashing phase we release the snapshot, but when we again acquire the catalog snapshot we again don’t use any particular timestamp and use the latest NoOverlapPoint , which could now be a different timestamp.
This can cause an issue if we insert a bunch of docs with identical keys, and then a few docs with distinct keys, all while dbcheck is running in the background. So a concurrent situation can happen where when the primary does the batching/reverse lookup phase, the test has only inserted a partial amount of all the docs (all docs with identical keys), and dbcheck will reaches the end of the index at that point, so then we use $minKey and $maxKey as the batch bounds. Then when the primary does the hashing phase, it uses a new timestamp of the new latest kNoOverlap point, where more docs (including more distinct docs) have been inserted. This is also the timestamp that the secondary will use, so during the hashing phase of the secondary, since $maxKey is the batch bound, and the batch consists of only identical keys, we will search to see if there are additional distinct keys afterwards, and we will actually find additional distinct keys and think that there is an inconsistency, but in reality it’s just because it’s reading at a different timestamp than the batching/reverse lookup phase.
One solution is to for us during the primary's hashing phase to instead acquire the snapshot using the read timestamp from the last catalog snapshot of the batching/reverse lookup phase. (similar to how on the secondary we use the same readTimestamp that we use during the hashing phase of the primary here) There is some performance concern since we would need to ensure that the time it takes for the last snapshot + the hashing phase doesn't take long enough for the timestamp to fall off the 5 minute history window, since the secondary will need to be able to use that timestamp for its hashing, so we would need to ensure that the batch size/snapshot size are not too big.