Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-79636

equivalent() function for $expr is not collation-aware

    • Type: Icon: Improvement Improvement
    • Resolution: Unresolved
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • Query Optimization

      The MatchExpression interface offers MatchExpression::equivalent() which can be used to check whether two match expressions are the same. Consider the following two $expr match expressions:

      // Display the data in the collection.
      MongoDB Enterprise > db.c.find()
      { "_id" : ObjectId("64caa40c416866f24e97cc48"), "str" : "a" }
      { "_id" : ObjectId("64caa40e416866f24e97cc4a"), "str" : "A" }
      { "_id" : ObjectId("64caa410416866f24e97cc4c"), "str" : "b" }
      
      // Query using lowercase constant.
      MongoDB Enterprise > db.c.find({$expr: {$eq: ["$str", "a"]}}).collation({locale: "en_US", strength: 2})
      { "_id" : ObjectId("64caa40c416866f24e97cc48"), "str" : "a" }
      { "_id" : ObjectId("64caa40e416866f24e97cc4a"), "str" : "A" }
      
      // Query using uppercase constant.
      MongoDB Enterprise > db.c.find({$expr: {$eq: ["$str", "A"]}}).collation({locale: "en_US", strength: 2})
      { "_id" : ObjectId("64caa40c416866f24e97cc48"), "str" : "a" }
      { "_id" : ObjectId("64caa40e416866f24e97cc4a"), "str" : "A" }
      

      These two queries use the case-insensitive collation and therefore are identical in meaning. However, the implementation of ExprMatchExpression::equivalent() is not collation-aware. Since we haven't implemented related ticket SERVER-30982 yet, ExprMatchExpression::equivalent() currently works by serializing both the left-hand side and right-hand side to a mongo::Value representation and then comparing the resulting values with the simple collator. Because we're using the simple collator, these two expressions will erroneously be considered non-equivalent.

      This is not an issue which will result in a user facing bug as currently there is a stronger collation being used for comparison. Yet there is some potential that queries do miss out on a few optimizations due to a more strict comparison. The same also applies for the Hashing function from the Boolean simplification from SERVER-79018. For the scope of this ticket the implementation of ExprMatchExpression::equivalent() should respect comparisons with the collations in mind. This will have an effect on long-tailed customers.

            Assignee:
            backlog-query-optimization [DO NOT USE] Backlog - Query Optimization
            Reporter:
            david.storch@mongodb.com David Storch
            Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

              Created:
              Updated: