Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-35889

$limit doesn't reduce amount of documents to be examined in $aggregation pipeline

    • Type: Icon: Bug Bug
    • Resolution: Works as Designed
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: 3.4.14
    • Component/s: Aggregation Framework
    • None
    • ALL

      The problem is that $limit doesn't affect the amount of documents to be loaded into memory for examination in the aggregation pipeline. For example, check this one:

       

      > db.system.profile.find().limit(1).sort( { ts : -1 } ).pretty()
      {
      	"op" : "command",
      	"ns" : "datasetTreeDB.datasetMappingAnnotation",
      	"command" : {
      		"aggregate" : "datasetMappingAnnotation",
      		"pipeline" : [
      			{
      				"$match" : {
      					"rootNode" : "markingtool",
      					"imageCreatedDate" : {
      						"$gt" : ISODate("2016-03-27T00:00:00Z")
      					},
      					"imageMeta__league" : "Major League Baseball"
      				}
      			},
      			{
      				"$limit" : 100000
      			},
      			{
      				"$sort" : {
      					"randomIndex" : 1
      				}
      			},
      			{
      				"$limit" : 1
      			},
      			{
      				"$project" : {
      					"_id" : 0,
      					"fileName" : 1,
      					"rootNode" : 1
      				}
      			},
      			{
      				"$count" : "count"
      			}
      		],
      		"allowDiskUse" : true,
      		"cursor" : {
      			
      		}
      	},
      	"keysExamined" : 100000,
      	"docsExamined" : 100000,
      	"hasSortStage" : true,
      	"cursorExhausted" : true,
      	"numYield" : 816,
      	"locks" : {
      		"Global" : {
      			"acquireCount" : {
      				"r" : NumberLong(1654)
      			}
      		},
      		"Database" : {
      			"acquireCount" : {
      				"r" : NumberLong(827)
      			}
      		},
      		"Collection" : {
      			"acquireCount" : {
      				"r" : NumberLong(826)
      			}
      		}
      	},
      	"nreturned" : 1,
      	"responseLength" : 130,
      	"protocol" : "op_command",
      	"millis" : 1346,
      	"planSummary" : "IXSCAN { rootNode: 1, imageMeta__league: 1, imageCreatedDate: 1, randomIndex: 1, imageId: 1, secondLevelNode: 1 }",
      	"ts" : ISODate("2018-06-28T17:05:57.702Z"),
      	"client" : "127.0.0.1",
      	"appName" : "MongoDB Shell",
      	"allUsers" : [ ],
      	"user" : ""
      }

       

      I want to get only 1 result from the aggregation and include into this result a field that doesn't exist in the index. The expected behaviour that it will scan 100000 keys for sorting in memory using values from indexes (it makes sense as after "imageCreatedDate" search it can't use further indexes for sorting anymore) and then load and test only 1 first document. However it tried to load all 100000 documents ("docsExamined" : 100000) - it completely kills the performance of this operation.

       

      If do not include the field that doesn't exist in the index, I can see that it doesn't try to fetch any document and can return the result using indexes only:

       

       

      db.system.profile.find().limit(1).sort( { ts : -1 } ).pretty()
      {
      	"op" : "command",
      	"ns" : "datasetTreeDB.datasetMappingAnnotation",
      	"command" : {
      		"aggregate" : "datasetMappingAnnotation",
      		"pipeline" : [
      			{
      				"$match" : {
      					"rootNode" : "markingtool",
      					"imageCreatedDate" : {
      						"$gt" : ISODate("2016-03-27T00:00:00Z")
      					},
      					"imageMeta__league" : "Major League Baseball"
      				}
      			},
      			{
      				"$limit" : 100000
      			},
      			{
      				"$sort" : {
      					"randomIndex" : 1
      				}
      			},
      			{
      				"$limit" : 1
      			},
      			{
      				"$project" : {
      					"_id" : 0,
      					"rootNode" : 1
      				}
      			},
      			{
      				"$count" : "count"
      			}
      		],
      		"allowDiskUse" : true,
      		"cursor" : {
      			
      		}
      	},
      	"keysExamined" : 100000,
      	"docsExamined" : 0,
      	"hasSortStage" : true,
      	"cursorExhausted" : true,
      	"numYield" : 785,
      	"locks" : {
      		"Global" : {
      			"acquireCount" : {
      				"r" : NumberLong(1586)
      			}
      		},
      		"Database" : {
      			"acquireCount" : {
      				"r" : NumberLong(793)
      			}
      		},
      		"Collection" : {
      			"acquireCount" : {
      				"r" : NumberLong(792)
      			}
      		}
      	},
      	"nreturned" : 1,
      	"responseLength" : 130,
      	"protocol" : "op_command",
      	"millis" : 182,
      	"planSummary" : "IXSCAN { rootNode: 1, imageMeta__league: 1, imageCreatedDate: 1, randomIndex: 1, imageId: 1, secondLevelNode: 1 }",
      	"ts" : ISODate("2018-06-28T17:16:10.787Z"),
      	"client" : "127.0.0.1",
      	"appName" : "MongoDB Shell",
      	"allUsers" : [ ],
      	"user" : ""
      }

       

       

      These ones are synthetic example and in real environment I have ~100.000.000 items in collection and my goal is to use indexes at the "$match" step, then "$sort" (like in this example) and then return only first 100.000 items ($limit with 100.000). However if "$match" step returns 5.000.000 items it tries to load and examine all 5.000.000 documents and only then return top 100.000.

      Is there a way how I can fix this behaviour? 

       

      Thank you!

       

        1. fastRequest.json
          6 kB
        2. simple_example.js
          2 kB
        3. slowRequest.json
          7 kB

            Assignee:
            nick.brewer Nick Brewer
            Reporter:
            ASemen Alexey [X]
            Votes:
            0 Vote for this issue
            Watchers:
            11 Start watching this issue

              Created:
              Updated:
              Resolved: