-
Type: Task
-
Resolution: Fixed
-
Priority: Major - P3
-
Affects Version/s: None
-
Component/s: None
-
None
-
Fully Compatible
-
Security 2023-07-10, Security 2023-07-24, Security 2023-08-07
IDL generated parsers cost O(N x M) where N = number of fields in the document and M = number of fields defined in IDL. This is due to the fact that each field in a document is checked against each field in the IDL until one is found. Over releases, the value of M has grown larger and the IDL generated parsers are a source of release over release slow down.
I have briefly investigated using a generated trie instead of a linear scan. I should do more investigation since the results were inconclusive. I should consider a full trie for instance and maybe a hybrid trie (say 1 level deep) approach.
I should microbenchmark different sizes of M and on different documents of N. The most important two cases are findOne, small agg and insertOne. Slower commands will not be impacted by this slowdown.
Finally, one other issue in the generated IDL parsers for commands is how generic args are handled. The generic args are inefficiently checked today as they are always checked after all M IDL fields are checked. Now that generic args are part of IDL, we should evaluate merging them into the command parsers as ignored fields.
- related to
-
SERVER-81876 Improve IDL code generation for command parsers
- Closed
-
SERVER-83671 Implement lookup tries for IDL parse in C++
- Backlog
-
SERVER-82940 Use tries for IDL enumeration lookups
- Closed