-
Type: Task
-
Resolution: Won't Do
-
Priority: Major - P3
-
None
-
Affects Version/s: None
-
Component/s: IDL
-
Service Arch
-
Service Arch 2024-01-22
Let's take the "find" command as an example. The command is defined here, and includes 33 fields at the time of writing this ticket. The generated parser uses a series of if / else if conditional statements to parse a BSON, by going through every element in the object:
void parseProtected(const IDLParserContext&, const OpMsgRequest& request) { ... for (const auto& element : request.body) { ... if (fieldName == kFilterFieldName) { ... } else if (fieldName == kProjectionFieldName) { ... } else if (fieldName == kSortFieldName) { ... } ... } }
This will translate into multiple string comparisons for each element in the BSON object (on average 16 string comparisons). As a result, parsing commands shows up as a hot-spot in performance profiles, especially for low-latency workloads (e.g. YCSB 100% read). The performance implications scales with the number of fields in a command, and the order of if/else statements.
We can reduce the cost of parsing commands (both number of instructions and CPI) by either of the following:
- Reordering if/else statements based on the likelihood of taking each branch to reduce the total number of string comparisons.
- Avoid parsing the BSON object multiple times if possible (less pressure on the cache).
- Hashing filed names at compile-time, and using the hashed values to parse commands (my PoC).
Here are some performance numbers for my PoC using micro-benchmarks:
Running build/install/bin/find_command_bm Run on (8 X 3501.93 MHz CPU s) CPU Caches: L1 Data 48 KiB (x4) L1 Instruction 32 KiB (x4) L2 Unified 1280 KiB (x4) L3 Unified 55296 KiB (x1) Load Average: 0.86, 0.77, 0.54 -------------------------------------------------------------------------------------- Benchmark Time CPU Iterations UserCounters... -------------------------------------------------------------------------------------- BM_ParseFindCommand 510 ns 510 ns 1373844 items_per_second=1.96008M/s BM_ParseHeavyFindCommand 566 ns 566 ns 1262140 items_per_second=1.76706M/s BM_ParseFindCommandPoC 287 ns 287 ns 2440443 items_per_second=3.48677M/s BM_ParseHeavyFindCommandPoC 232 ns 232 ns 3011081 items_per_second=4.30192M/s
- duplicates
-
SERVER-82940 Use tries for IDL enumeration lookups
- Closed
- is related to
-
SERVER-78439 Investigate IDL generated parser performance
- Closed
- related to
-
SERVER-83671 Implement lookup tries for IDL parse in C++
- Backlog
-
SERVER-82940 Use tries for IDL enumeration lookups
- Closed