-
Type: Task
-
Resolution: Unresolved
-
Priority: Major - P3
-
None
-
Affects Version/s: None
-
Component/s: None
-
None
-
Query Execution
Consider the following scenario. There is a collection with a document _id: 123 and two separate clients running. The following sequence happens:
- Client 1 Begins a transaction
- Client 1 Deletes _id: 123
- Client 1 Inserts a new document with _id: 123 (but with a new RecordId)
- Client 2 runs a findAndModify or updateOne targeting _id: 123, and attempts to set a field 'X'
- (Client 2 conflicts with the running transaction and is in a retry loop)
- Client 1 commits its transaction
- Client 2's operation completes
Today, (in versions 4.4-7.0) client 2's findAndModify does not update anything, even though there was a document with _id: 0 the entire time. This is permitted under read-committed semantics, since queries under read-committed isolation can miss rows entirely, though it is confusing.
What specifically causes Client 2's operation to "miss" the document?
Client 2's operation is an UPDATE -> IDHACK plan. The plan reads from the _id index, fetches the document, and then receives a WriteConflict while attempting to update it. The UpdateStage stashes the document it read from the below stage when it gets a write conflict. After a WriteConflict, we abort our WT transaction, and start a new one at a new point in time. The UpdateStage then "recovers" its state (namely, the copy of the document it was trying to update and Record ID). It re-fetches this document by RecordId and checks whether it still matches the filter.
Since the RecordId stashed no longer exists after Thread 1 deleted it, no document is fetched. The UpdateStage then returned NEED_TIME and in the subsequent call to work(), the IDHackStage returns EOF.
What are our known options? (We can add to this)
- Update the documentation to make it clear that two documents with the same _id are not necessarily "the same document." Otherwise no change in server behavior. Today's behavior is allowed under read committed isolation, so while it's inconvenient, it's not a bug. There are also two workarounds:
- Thread 2 could use findAndModify and specify a sort. The sort acts as a sort+limit 1, and if the document which comes first in the sort order gets removed or doesn't match the predicate, we retry the entire operation over via this code path.
- Thread 1 could update the document instead of deleting and re-inserting it, which would preserve its RecordId.
- Change the behavior so when a document is deleted and re-inserted (Same _id, but new record ID), concurrent updates will succeed.
- One idea MaxH had for this was to have the IDHack stage continue seeking/fetching even after it's returned a document, when beneath a write stage. Essentially, removing the limit 1 that's baked into it today (only when beneath a write stage).
- Pass a flag via the UpdateParams indicating that the query is reading by _id and change this code to check that flag. This would cause the operation to behave just like findAndModify with a sort does today, without changing the IDHack stage.
Make some more general change to the update code to retry when a conflict is hit and the document is later found to be missing.- This would result in a perf hit for some scenarios, since it would cause operations to retry completely which don't today.
Repro
A repro script is attached below. It can be run with the following resmoke invocation:
python3 buildscripts/resmoke.py run --installDir build/install/bin --suites=replica_sets fam-repro-replset.js
- is related to
-
SERVER-86250 Consider changing findAndModify behavior when concurrent operation changes the sort key
- Open