We need to tread with care to ensure that this is correct in all cases, but it appears that we can skip some planning work while building bounds for an $in.
If an $in has no regexes, then it consists of a list of sorted, deduped equalities which take into account the collation. The purpose of calling IndexBoundsBuilder::unionize() is to ensure that the IndexBounds have properly ordered interval lists, and that any overlapping intervals are merged. However, if we process the $in in sorted order, then the intervals should already be sorted. And if the $in has already be deduped, then there cannot be any overlapping intervals.
IndexBoundsBuilder::unionize() has been shown to be slow for queries which have a large list of $in values (say, several hundred thousand distinct values). Local testing shows that skipping this work could speed such queries up possibly by as much as 40%.
Note that this ticket is distinct from SERVER-34012, which identifies quadratic time complexity specifically for $or queries. In contrast, the $in code path for bounds building is not quadratic.
- is related to
-
SERVER-35693 Parsing of $in takes quadratic time due to O(n^2) boost::flat_set constructor
- Closed
-
SERVER-30189 Reduce calls to allocator for large $in expressions
- Closed
-
SERVER-34012 Planner's logic for taking union of index bounds intervals is slow for large $or queries
- Closed