Loading...

XML

Word

Printable

JSON

Type: Task
Resolution: Won't Do
Priority: Major - P3
Fix Version/s: None
Affects Version/s: None
Component/s: IDL
Labels:

Assigned Teams:

Service Arch
Sprint:
Service Arch 2024-01-22
Confidence Status:
None
Work Order:
3

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name:
None
Goal Link:
None

Let's take the "find" command as an example. The command is defined here, and includes 33 fields at the time of writing this ticket. The generated parser uses a series of if / else if conditional statements to parse a BSON, by going through every element in the object:

void parseProtected(const IDLParserContext&, const OpMsgRequest& request) {
    ...
    for (const auto& element : request.body) {
        ...
        if (fieldName == kFilterFieldName) { ... }
        else if (fieldName == kProjectionFieldName) { ... }
        else if (fieldName == kSortFieldName) { ... }
        ...
    }
}

This will translate into multiple string comparisons for each element in the BSON object (on average 16 string comparisons). As a result, parsing commands shows up as a hot-spot in performance profiles, especially for low-latency workloads (e.g. YCSB 100% read). The performance implications scales with the number of fields in a command, and the order of if/else statements.

We can reduce the cost of parsing commands (both number of instructions and CPI) by either of the following:

Reordering if/else statements based on the likelihood of taking each branch to reduce the total number of string comparisons.
Avoid parsing the BSON object multiple times if possible (less pressure on the cache).
Hashing filed names at compile-time, and using the hashed values to parse commands (my PoC).

Here are some performance numbers for my PoC using micro-benchmarks:

Running build/install/bin/find_command_bm
Run on (8 X 3501.93 MHz CPU s)
CPU Caches:
  L1 Data 48 KiB (x4)
  L1 Instruction 32 KiB (x4)
  L2 Unified 1280 KiB (x4)
  L3 Unified 55296 KiB (x1)
Load Average: 0.86, 0.77, 0.54
--------------------------------------------------------------------------------------
Benchmark                            Time             CPU   Iterations UserCounters...
--------------------------------------------------------------------------------------
BM_ParseFindCommand                510 ns          510 ns      1373844 items_per_second=1.96008M/s
BM_ParseHeavyFindCommand           566 ns          566 ns      1262140 items_per_second=1.76706M/s
BM_ParseFindCommandPoC             287 ns          287 ns      2440443 items_per_second=3.48677M/s
BM_ParseHeavyFindCommandPoC        232 ns          232 ns      3011081 items_per_second=4.30192M/s

duplicates

SERVER-82940 Use tries for IDL enumeration lookups

Closed

is related to

SERVER-78439 Investigate IDL generated parser performance

Closed

related to

SERVER-83671 Implement lookup tries for IDL parse in C++

Backlog

SERVER-82940 Use tries for IDL enumeration lookups

Closed

Assignee:: Dominic Hernandez (Inactive)
Reporter:: Amirsaman Memaripour
Participants:: Amirsaman Memaripour, Dominic Hernandez, Jason Chan
Votes:: 1 Vote for this issue
Watchers:: 9 Start watching this issue

Created:: Oct 04 2023 08:20:06 PM UTC
Updated:: Jan 24 2024 10:52:04 PM UTC
Resolved:: Jan 11 2024 09:28:20 PM UTC

Details

Description

Attachments

Issue Links

Activity

People

Dates