-
Type: Improvement
-
Resolution: Unresolved
-
Priority: Major - P3
-
None
-
Affects Version/s: None
-
Component/s: None
-
None
-
Query Execution
In determining the join strategy for a $lookup, currently if a suitable index is not available on the foreign collection (which would trigger use of LookupStrategy::kIndexedLoopJoin), the decision whether to use LookupStrategy::kHashJoin or LookupStrategy::kNestedLoopJoin in SBE is based entirely on the stats of the foreign collection. This means that if the foreign collection has stats and is small enough, it will choose HashJoin even if the local collection has only one document in it. In edge cases like this NestedLoopJoin would be faster.
The join strategy selection should be improved to veto HashJoin if there both are stats available for the local collection AND they show it has a very small number of documents (maybe < 1,000? Some experimentation needed to find a reasonable cutoff point – perhaps this would be as low as single digits in practice). If there are no stats available for the local collection, it should continue to assume that the hash table will pay for itself and choose HashJoin, as that will be the more common case in practice.