-
Type: New Feature
-
Resolution: Unresolved
-
Priority: Major - P3
-
None
-
Affects Version/s: None
-
Component/s: None
-
Query Optimization
-
QO 2024-11-11, QO 2024-11-25
Summary: Implement code for taking a series of recorded queries, from multiple threads, and writing them to disk in a format of choice.
The general presumed structure of the data on disk derived from the requirements is described in the Technical Design, but the details are left open at this time.
Implement code which can be called from multiple threads to record queries and associated metrics, and writes them to disk for later reading.
There will be a corresponding read interface, but this ticket covers the write interface initially; reusable artefacts (e.g., a schema as may be used for flatbuffers) should be noted for reuse!
The primary requirement is: Recording a query should perform minimal work, to avoid impacting user operations
Following from this:
- attempting to record a single query should not lead to an immediate write to disk; writes should be batched.
- synchronisation should not be required for every single write. This suggests thread-local batching could be performed, with synchronisation only required every N queries/ N buffered bytes.
- ideally writing to disk would not be performed by the query-serving thread; blocking disk IO on frontend threads would impact queries. Thus, a separate persistence thread may be wise.
—
Suggested interface for general planning; this is not concrete - work for this ticket will define the (first draft of) the interface.
Manager Manager(/* path to store files */) requestBuffer() -> Buffer releaseBuffer(Buffer&&) -> void Buffer Buffer(size_t size) recordQuery(const BSONObj& queryBody, int64_t startTS, int64_t endTS) -> bool /* false = couldn't fit in current buffer */ numQueries() const -> /* number of recorded queries */ size() const -> /* num bytes of _data_, not capacity of the buffer */
See design for proposed sequence of events.
The buffer may be a "batch", accumulating `<BSONObj, int64_t, int64_t>` in memory. However, for simplicity at persistence, it would also be reasonable for the buffer to be a genuine fixed size byte buffer, with the values immediately serialised into this buffer in the format they will be persisted in. This has the added benefit of minimizing (de)allocation churn; such buffers could be pooled and reused, vs extending the lifetime of BSONObj shared buffers.
The exact format on disk can be decided by convenience; the general layout is discussed in the design doc.