-
Type: Improvement
-
Resolution: Duplicate
-
Priority: Major - P3
-
None
-
Affects Version/s: None
-
Component/s: None
-
None
-
Query 2020-07-27
-
(copied to CRM)
If we have an index with a certain collation defined, then a search performed using that index will use the collation not only for that field but for other fields as well.
Consider a test collection with its corresponding indexes like this:
> db.coll_insens_search_test.find() { "_id" : 1, "A" : "goat", "B" : "core" } { "_id" : 2, "A" : "goat", "B" : "CORE" } { "_id" : 3, "A" : "GoAt", "B" : "core" } { "_id" : 4, "A" : "GOAT", "B" : "CORE" } { "_id" : 5, "A" : "GOat", "B" : "core" }
> db.coll_insens_search_test.getIndexes() [ { "v" : 2, "key" : { "_id" : 1 }, "name" : "_id_", "ns" : "test.coll_insens_search_test" }, { "v" : 2, "key" : { "B" : 1 }, "name" : "HD_CaseInsensitiveIndex", "ns" : "test.coll_insens_search_test", "collation" : { "locale" : "en_US", "caseLevel" : false, "caseFirst" : "off", "strength" : 2, "numericOrdering" : false, "alternate" : "non-ignorable", "maxVariable" : "punct", "normalization" : false, "backwards" : false, "version" : "57.1" } } ]
The case-insensitive index above is only on field B and not on field A. However, the following query yields all 5 documents:
> db.coll_insens_search_test.find({ A: "GOat", B:"core" }).collation({ locale: "en_US", strength: 2 }) { "_id" : 1, "A" : "goat", "B" : "core" } { "_id" : 2, "A" : "goat", "B" : "CORE" } { "_id" : 3, "A" : "GoAt", "B" : "core" } { "_id" : 4, "A" : "GOAT", "B" : "CORE" } { "_id" : 5, "A" : "GOat", "B" : "core" }
In other words, we see that the case-insensitive search is being performed not only for B on which the index is defined, but also for A which is not covered in the index (it is debatable whether this is the correct behavior). Otherwise it would've only returned the last document with "_id" : 5.
The winningPlan for the above query looks like this:
"winningPlan" : { "stage" : "FETCH", "filter" : { "A" : { "$eq" : "GOat" } }, "inputStage" : { "stage" : "IXSCAN", "keyPattern" : { "B" : 1 }, "indexName" : "HD_CaseInsensitiveIndex", "collation" : { "locale" : "en_US", "caseLevel" : false, "caseFirst" : "off", "strength" : 2, "numericOrdering" : false, "alternate" : "non-ignorable", "maxVariable" : "punct", "normalization" : false, "backwards" : false, "version" : "57.1" }, "isMultiKey" : false, "multiKeyPaths" : { "B" : [ ] }, "isUnique" : false, "isSparse" : false, "isPartial" : false, "indexVersion" : 2, "direction" : "forward", "indexBounds" : { "B" : [ "[\"-EK1\u0001\b\", \"-EK1\u0001\b\"]" ] } } }
Without using the collation, it yields only the 5th document as expected, as shown here:
> db.coll_insens_search_test.find({ A: "GOat", B:"core" }) { "_id" : 5, "A" : "GOat", "B" : "core" }
This is a feature request to provide the ability to perform a collation aware search only on the fields on which the index is defined, and not expand it to other fields which are not covered in the index.
- duplicates
-
SERVER-25954 Support more granular collation specification
- Backlog