-
Type: Task
-
Resolution: Fixed
-
Priority: Major - P3
-
Affects Version/s: None
-
Component/s: Replication
-
Fully Compatible
-
Repl 2020-10-05, Repl 2020-10-19, Repl 2020-11-02
Consider the following scenario:
- We start migrating tenant X
- The migration sets a start timestamp of TS(100)
- When the tenant cloners complete, the last write on the donor for tenant X is TS(90) and tenant Y is TS(150)
- TS(150) is the read concern majority optime on the donor, and thus is the ‘lastVisibleOpTime’ that the recipient receives.The recipient thus sets its 'stopTimestamp’ to TS(150).
- The last oplog entry fetched on the recipient is at TS(90)
The recipient will never apply an oplog entry with a timestamp greater than or equal to TS(150), and thus will never think it’s consistent.
To fix this, we make sure that the tenant oplog applier writes a noop oplog entry into its oplog buffer whenever it receives a batch. We must be careful however, that this noop entry is not too high. If the recipient wrote the ‘lastVisibleOpTime’ as a noop, then if the recipient were lagged, that noop could make it appear as though the recipient were actually more up to date than it actually is. The correct value is the “latest oplog timestamp the donor sees when doing its oplog query”. This is exactly what the TRACK_LATEST_OPLOG_TS query parameter includes in the query response, with the postBatchResumeToken.
We write these noops for empty batches as well since it should be simple to ignore duplicate timestamps in the oplog buffer and it will ensure the recipient does not need to rescan oplog entries on recovery that it filtered out previously.
Resharding faces this analogous problem, but is solving it in aggregation since they use aggregation rather than find commands. We must correctly expose this resume token for find commands in SERVER-51227, and then write and process the noops in this ticket.
- depends on
-
SERVER-51227 Make find/getMore cmd with $_requestResumeToken on oplog collection to report latest oplog entry ts instead of the latest record id seen while generating the response batch.
- Closed
- is depended on by
-
SERVER-51734 Enable tenant migration recipient testing.
- Closed
- is related to
-
SERVER-52628 Tenant migration recipient can give a false indication to donor about the data being majority committed on recipient replica set.
- Closed
-
SERVER-49897 Insert no-op entries into oplog buffer collections for resharding so resuming is less wasteful
- Closed
- related to
-
SERVER-61440 Race in tenant_migration_recipient_current_op.js
- Closed