-
Type: New Feature
-
Resolution: Unresolved
-
Priority: Major - P3
-
None
-
Affects Version/s: None
-
Component/s: None
-
Query Execution
-
Execution Team 2024-11-11, Execution Team 2024-11-25
For interesting variations of queries which read or modify user data e.g.,
- find {,AndModify}
- aggregate
- insert
- remove
Investigate what fields need to be pruned and regenerated from a recorded query to allow it to be replayed by a client. For example:
{ "aggregate": "sharded", "pipeline": [ { "$match": { "a": { "$gte": 0.0 } } } ], "cursor": {}, "lsid": { "id": { "$uuid": "5e8a24d38c7b4656959b93350a82b5d0" } }, "$clusterTime": { "clusterTime": { "$timestamp": { "t": 1730306867, "i": 1 } }, "signature": { "hash": { "$binary": "AAAAAAAAAAAAAAAAAAAAAAAAAAA=", "$type": "00" }, "keyId": 0 } }, "$db": "test" }
Only a subset of the fields here would be correct to use in a replayed query.
Workload record is not yet implemented, so initial exploration will require "manually" collecting example queries - e.g., by inserting logging of RequestExecutionContext::_request at some point during the life of a query.
Then investigate how easily the C++ driver can be used to replay such queries, with a a simple POC to inform later work. Assume queries will be provided as BSONObj.
It may be the case that the driver can be provided the query object (with unsuitable fields removed) directly, or the easiest path is a if/else if chain re-building the query using the normal C++ driver methods (bearing in mind that queries may have provided .limit(...), .hint(...) and so on).
Note: there will be "inter-query relationships" e.g., getMore for a particular cursor following a find - directly replaying the getMore is likely to fail as the cursor info will differ during replay - this does not need to be considered yet, later work will investigate/address replaying a series of queries "properly".