-
Type: Bug
-
Resolution: Duplicate
-
Priority: Major - P3
-
None
-
Affects Version/s: None
-
Component/s: None
-
None
-
Query Optimization
-
ALL
-
QO 2023-01-09, QO 2023-01-23, QO 2023-02-06
I tried out a simple example of using $telemetry at version 038c67d99cda1fb242ce3b4dcaf331e459f3ff41 of the master branch. First, in order to enable workload telemetry collection in the server, I started it like so:
./mongod --setParameter internalQueryConfigureTelemetrySamplingRate=1000000
Here's a snippet from the mongo shell which reproduces the problem:
MongoDB Enterprise > db.c.find({a: {$gt: 3}}) MongoDB Enterprise > db.getSiblingDB("admin").aggregate([{$telemetry: {}}]).pretty() { "key" : { "find" : { "find" : "###", "filter" : { "###" : { "###" : "###" } } }, "namespace" : "test.c", "applicationName" : "MongoDB Shell" }, "metrics" : { "lastExecutionMicros" : NumberLong(1961), "execCount" : NumberLong(1), "queryOptMicros" : { "sum" : NumberLong(287), "max" : NumberLong(287), "min" : NumberLong(287), "sumOfSquares" : NumberLong(82369) }, "queryExecMicros" : { "sum" : NumberLong(1961), "max" : NumberLong(1961), "min" : NumberLong(1961), "sumOfSquares" : NumberLong(3845521) }, "docsReturned" : { "sum" : NumberLong(0), "max" : NumberLong(0), "min" : NumberLong(0), "sumOfSquares" : NumberLong(0) }, "docsScanned" : { "sum" : NumberLong(1), "max" : NumberLong(1), "min" : NumberLong(1), "sumOfSquares" : NumberLong(1) }, "keysScanned" : { "sum" : NumberLong(0), "max" : NumberLong(0), "min" : NumberLong(0), "sumOfSquares" : NumberLong(0) }, "firstSeenTimestamp" : Timestamp(1668636882, 0) }, "asOf" : Timestamp(1668636883, 0) } ...
The bug pertains to the value of the key field. As you can see, all field names and values are redacted, including the $gt. We know that we need to redact constants in the query since it may be PII or have data security/privacy considerations. I believe there is an active discussion about our behavior about redacting or anonymizing field names. But there is no doubt that we should be including the $gt in the output. Otherwise we know nothing about what the query actually was.
Note that the same problem occurs even if I enable internalQueryConfigureTelemetryFieldNameRedactionStrategy=sha256. In that case, the key looks like this:
"key" : { "find" : { "find" : "###", "filter" : { "ypeBEsobvcr6" : { "Jt74XIwL/Ngm" : "###" } } }, "namespace" : "test.c", "applicationName" : "MongoDB Shell" },
Another note: Are there any end-to-end tests which show that redaction is working as expected?
- depends on
-
SERVER-73141 Generate query shape (literal redaction) for expressions in expression_leaf.h
- Closed
- related to
-
SERVER-71427 $telemetry returns multiple entries with the same key even though the corresponding queries were distinct shapes
- Closed