-
Type: Bug
-
Resolution: Unresolved
-
Priority: Major - P3
-
None
-
Affects Version/s: 8.0.2
-
Component/s: None
-
Query Execution
-
ALL
-
(copied to CRM)
Update:
Curently we recommend the customer to either (1) specify the collation used in the pipeline to be that of the foreign collection; or if this is not possible (2) turn off SBE.
Separately, a fix to improve collation support is being evaluated.
Issue:
The logic to determine that an index is eligible for indexed loop to join strategy when we push down $lookup to the SBE checks if the local collection's collator is same as the foreign collection's collator. If the collations are different, $lookup SBE lowering logic does not utilize the index when written with Equality Match with a Single Join Condition
The behaviour is not observed when we write queries with Correlated Subqueries Using Concise Syntax. In other words, the classic engine uses the _id_ index for the inner query of $lookup.
we need to confirm why we do see the difference in behaviour for different syntax in which One syntax is able to use the index while other is not.
For instance:
{$lookup: { from: "a", localField: "_id", foreignField: "_id", as: "b" }}
The above syntax does not use the _id_ index but the following syntax uses the _id_ index which seems to be a bug because if the collator is different, it is likely that incorrect results are produced for string equality comparison.
{$lookup: { from: "a", let: {x: "$_id"}, pipeline: [ {$match: {$expr: {$eq: ["$_id", "$$x"]}}} ], as: "b" }}
Sample documents and collection collator information from the customer can be found at the linked HELP ticket.