Loading...

XML

Word

Printable

JSON

Type: New Feature
Resolution: Unresolved
Priority: Major - P3
Fix Version/s: None
Affects Version/s: None
Component/s: None
Labels:
- M0

Assigned Teams:

Query Optimization
Sprint:
QO 2024-11-11, QO 2024-11-25

Summary: Implement code for taking a series of recorded queries, from multiple threads, and writing them to disk in a format of choice.

The general presumed structure of the data on disk derived from the requirements is described in the Technical Design, but the details are left open at this time.

Implement code which can be called from multiple threads to record queries and associated metrics, and writes them to disk for later reading.

There will be a corresponding read interface, but this ticket covers the write interface initially; reusable artefacts (e.g., a schema as may be used for flatbuffers) should be noted for reuse!

The primary requirement is: Recording a query should perform minimal work, to avoid impacting user operations

Following from this:

attempting to record a single query should not lead to an immediate write to disk; writes should be batched.

synchronisation should not be required for every single write. This suggests thread-local batching could be performed, with synchronisation only required every N queries/ N buffered bytes.

ideally writing to disk would not be performed by the query-serving thread; blocking disk IO on frontend threads would impact queries. Thus, a separate persistence thread may be wise.

—

Suggested interface for general planning; this is not concrete - work for this ticket will define the (first draft of) the interface.

Manager
    Manager(/* path to store files */)
    requestBuffer() -> Buffer
    releaseBuffer(Buffer&&) -> void

Buffer
    Buffer(size_t size)
    recordQuery(const BSONObj& queryBody, int64_t startTS, int64_t endTS) -> bool /* false = couldn't fit in current buffer */
    numQueries() const -> /* number of recorded queries */
    size() const -> /* num bytes of _data_, not capacity of the buffer */

See design for proposed sequence of events.

The buffer may be a "batch", accumulating `<BSONObj, int64_t, int64_t>` in memory. However, for simplicity at persistence, it would also be reasonable for the buffer to be a genuine fixed size byte buffer, with the values immediately serialised into this buffer in the format they will be persisted in. This has the added benefit of minimizing (de)allocation churn; such buffers could be pooled and reused, vs extending the lifetime of BSONObj shared buffers.

The exact format on disk can be decided by convenience; the general layout is discussed in the design doc.

Assignee:: James Harrison

Reporter:: James Harrison

Participants:: James Harrison

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Created:: Nov 04 2024 02:42:53 PM UTC

Updated:: Nov 08 2024 10:24:02 PM UTC

Details

Description

Attachments

Activity

People

Dates