-
Type: Task
-
Resolution: Unresolved
-
Priority: Major - P3
-
None
-
Affects Version/s: None
-
Component/s: None
-
None
-
Query Optimization
Case1: The following query over ce_data_1000 collection from the CE accuracy tests shows very imprecise estimate
Id: 6066: [ { "$match" : { "mixed_arr_str_70_30" : { "$gt" : "LeG7", "$lt" : "LgG7" } } } ], qtype: medium range, data type: array cardinality: 126, Histogram estimation: 394.17, errors: { "absError" : 268.17, "relError" : 2.13, "selError" : 26.82 }
The data has only 33 values and is completely represented in the histogram buckets.
If we apply the formula
Card(ArrayMin(a < valHigh)) - Card(ArrayMax(a < valLow)) we get 291 - 165 = 136, which is a much more precise estimate. Investigate why we get the value of 394.17.