Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-23582

Query planner should not generate DISTINCT_SCAN plan for distinct on dotted field

    • Type: Icon: Bug Bug
    • Resolution: Done
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: Querying
    • ALL

      The query planner currently generates DISTINCT_SCAN plans for distinct operations on dotted fields, if the distinct index is non-multikey. Due to SERVER-2104, distinct operations on dotted fields can never be covered, so DISTINCT_SCAN plans should never be generated for them.

      See, the following explain output for such an example distinct operation. The query planner generates a full index scan plan with a FETCH node, which is much more costly than a collection scan plan.

      > db.foo.drop()
      true
      > db.foo.ensureIndex({"a.b":1})
      {
      	"createdCollectionAutomatically" : true,
      	"numIndexesBefore" : 1,
      	"numIndexesAfter" : 2,
      	"ok" : 1
      }
      > db.foo.explain().distinct('a.b')
      {
      	"queryPlanner" : {
      		"plannerVersion" : 1,
      		"namespace" : "test.foo",
      		"indexFilterSet" : false,
      		"parsedQuery" : {
      
      		},
      		"winningPlan" : {
      			"stage" : "PROJECTION",
      			"transformBy" : {
      				"_id" : 0,
      				"a.b" : 1
      			},
      			"inputStage" : {
      				"stage" : "FETCH",
      				"inputStage" : {
      					"stage" : "DISTINCT_SCAN",
      					"keyPattern" : {
      						"a.b" : 1
      					},
      					"indexName" : "a.b_1",
      					"isMultiKey" : false,
      					"isUnique" : false,
      					"isSparse" : false,
      					"isPartial" : false,
      					"indexVersion" : 1,
      					"direction" : "forward",
      					"indexBounds" : {
      						"a.b" : [
      							"[MinKey, MaxKey]"
      						]
      					}
      				}
      			}
      		},
      		"rejectedPlans" : [ ]
      	},
      	"serverInfo" : {
      		"host" : "rassi",
      		"port" : 27017,
      		"version" : "3.3.4",
      		"gitVersion" : "2ab9b1fd7ea20319e2d9ddb0532234729877703f"
      	},
      	"ok" : 1
      }
      

      Relatedly, the second paragraph of the method comment for the getDistinctNodeIndex() helper is incorrect:

       * Returns true if indices contains an index that can be used with DistinctNode (the "fast distinct
       * hack" node, which can be used only if there is an empty query predicate).  Sets indexOut to the
       * array index of PlannerParams::indices.  Look for the index for the fewest fields.  Criteria for
       * suitable index is that the index cannot be special (geo, hashed, text, ...), and the index cannot
       * be a partial index.
       *
       * Multikey indices are not suitable for DistinctNode when the projection is on an array element.
       * Arrays are flattened in a multikey index which makes it impossible for the distinct scan stage
       * (plan stage generated from DistinctNode) to select the requested element by array index.
       *
       * Multikey indices cannot be used for the fast distinct hack if the field is dotted.  Currently the
       * solution generated for the distinct hack includes a projection stage and the projection stage
       * cannot be covered with a dotted field.
      

      The comment above does not correctly describe the behavior of distinct against an array. The semantics of distinct operations are such that arrays are flattened:

      > db.foo.drop()
      true
      > db.foo.insert({a:1})
      WriteResult({ "nInserted" : 1 })
      > db.foo.insert({a:[2,3]})
      WriteResult({ "nInserted" : 1 })
      > db.foo.distinct('a')
      [ 1, 2, 3 ]
      

            Assignee:
            david.storch@mongodb.com David Storch
            Reporter:
            rassi J Rassi
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated:
              Resolved: