-
Type:
Bug
-
Resolution: Unresolved
-
Priority:
Major - P3
-
None
-
Affects Version/s: None
-
Component/s: None
-
Query Optimization
-
ALL
-
None
-
None
-
None
-
None
-
None
-
None
-
None
For two documents such asĀ
{a: {b: [1, 2]}}, {a: [{b: 1}, {b: 2}]}
calculating the histogram stats over a key 'a.b' will not be able to distinguish the two. The counts in the histogram more-or-less represent the first case, and we lose the information that 'a' was an array of objects with 'b' fields. The reason for this is that the stats-generating pipeline attempts to get the value of a particular key via a stage such as {$project: {val: <$path>}} before passing to a group stage, which will traverse arrays and objects but lose any context along that path.
In practice, this shouldn't affect our cardinality estimation for most queries, since typically the path 'a.b' should consider both docs as equivalent for matching purposes. The issue comes with $elemMatch, since a query such as {a.b: {$elemMatch: {$eq: 1}}} should treat the leaf array as a match but not the array of documents.