Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-71426

Redaction for $telemetry redacts not only field names and values, but also MQL operators

    • Type: Icon: Bug Bug
    • Resolution: Duplicate
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • None
    • Query Optimization
    • ALL
    • QO 2023-01-09, QO 2023-01-23, QO 2023-02-06

      I tried out a simple example of using $telemetry at version 038c67d99cda1fb242ce3b4dcaf331e459f3ff41 of the master branch. First, in order to enable workload telemetry collection in the server, I started it like so:

      ./mongod --setParameter internalQueryConfigureTelemetrySamplingRate=1000000
      

      Here's a snippet from the mongo shell which reproduces the problem:

      MongoDB Enterprise > db.c.find({a: {$gt: 3}})
      MongoDB Enterprise > db.getSiblingDB("admin").aggregate([{$telemetry: {}}]).pretty()
      {
      	"key" : {
      		"find" : {
      			"find" : "###",
      			"filter" : {
      				"###" : {
      					"###" : "###"
      				}
      			}
      		},
      		"namespace" : "test.c",
      		"applicationName" : "MongoDB Shell"
      	},
      	"metrics" : {
      		"lastExecutionMicros" : NumberLong(1961),
      		"execCount" : NumberLong(1),
      		"queryOptMicros" : {
      			"sum" : NumberLong(287),
      			"max" : NumberLong(287),
      			"min" : NumberLong(287),
      			"sumOfSquares" : NumberLong(82369)
      		},
      		"queryExecMicros" : {
      			"sum" : NumberLong(1961),
      			"max" : NumberLong(1961),
      			"min" : NumberLong(1961),
      			"sumOfSquares" : NumberLong(3845521)
      		},
      		"docsReturned" : {
      			"sum" : NumberLong(0),
      			"max" : NumberLong(0),
      			"min" : NumberLong(0),
      			"sumOfSquares" : NumberLong(0)
      		},
      		"docsScanned" : {
      			"sum" : NumberLong(1),
      			"max" : NumberLong(1),
      			"min" : NumberLong(1),
      			"sumOfSquares" : NumberLong(1)
      		},
      		"keysScanned" : {
      			"sum" : NumberLong(0),
      			"max" : NumberLong(0),
      			"min" : NumberLong(0),
      			"sumOfSquares" : NumberLong(0)
      		},
      		"firstSeenTimestamp" : Timestamp(1668636882, 0)
      	},
      	"asOf" : Timestamp(1668636883, 0)
      }
      ...
      

      The bug pertains to the value of the key field. As you can see, all field names and values are redacted, including the $gt. We know that we need to redact constants in the query since it may be PII or have data security/privacy considerations. I believe there is an active discussion about our behavior about redacting or anonymizing field names. But there is no doubt that we should be including the $gt in the output. Otherwise we know nothing about what the query actually was.

      Note that the same problem occurs even if I enable internalQueryConfigureTelemetryFieldNameRedactionStrategy=sha256. In that case, the key looks like this:

      	"key" : {
      		"find" : {
      			"find" : "###",
      			"filter" : {
      				"ypeBEsobvcr6" : {
      					"Jt74XIwL/Ngm" : "###"
      				}
      			}
      		},
      		"namespace" : "test.c",
      		"applicationName" : "MongoDB Shell"
      	},
      

      Another note: Are there any end-to-end tests which show that redaction is working as expected?

            Assignee:
            jennifer.peshansky@mongodb.com Jennifer Peshansky (Inactive)
            Reporter:
            david.storch@mongodb.com David Storch
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

              Created:
              Updated:
              Resolved: