Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-51498

Sorting on duplicate values causes repeated results with limit and skip

    • Type: Icon: Bug Bug
    • Resolution: Works as Designed
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: 4.4.1
    • Component/s: None
    • None
    • ALL
    • Hide

      Tested on Windows 4.4.1, 4.2.10 and Linux Docker 4.4.1 and 4.2.10.

      4.2.10 on both OS's did not have this issue and return result groups as expected with results not repeating themselves. Results appeared to be sorted by _id as well as the sort value requested.  This default sort by _id does not appear in 4.4.1, which seems to be causing the issue.

      4.4.1 on both OS's appeared to have the issue described. Manually adding a sort by _id causes the issue to disappear.

      • Insert a series of documents (tested with about 20 document) containing one field with a duplicate value for all documents and one field with a value unique to each document.
      • Perform a query containing a sort on the field with the duplicate value and limit the results to some value (larger limits will make the issue more noticeable, tested with 5 or 10)
      • Perform the same query, but with a skip >= to the limit.
      • Expected: As long as the skip >= limit, the second results group should return unique values not in the first group
      • Current results: Values from the first group reappear in the second group. (In the image, note that 3_item, 0_item, 1_item, and 2_item appear in both groups)
      Show
      Tested on Windows 4.4.1, 4.2.10 and Linux Docker 4.4.1 and 4.2.10. 4.2.10 on both OS's did not have this issue and return result groups as expected with results not repeating themselves. Results appeared to be sorted by _id as well as the sort value requested.  This default sort by _id does not appear in 4.4.1, which seems to be causing the issue. 4.4.1 on both OS's appeared to have the issue described. Manually adding a sort by _id causes the issue to disappear. Insert a series of documents (tested with about 20 document) containing one field with a duplicate value for all documents and one field with a value unique to each document. Perform a query containing a sort on the field with the duplicate value and limit the results to some value (larger limits will make the issue more noticeable, tested with 5 or 10) Perform the same query, but with a skip >= to the limit. Expected: As long as the skip >= limit, the second results group should return unique values not in the first group Current results: Values from the first group reappear in the second group. (In the image, note that 3_item, 0_item, 1_item, and 2_item appear in both groups)

      Issue Status as of Jan 15, 2020

      ISSUE DESCRIPTION AND IMPACT

      As of MongoDB 4.4, find+sort operations that perform blocking/in-memory sorts on fields that contain non-unique values are much more likely to result in unstable sort orders between different operations. This most directly impacts pagination style operations and causes results to be duplicated or omitted from one query to the next.

      This is because in MongoDB 4.4, find+sort began using the same sorting logic as the aggregation $sort stage, which has always been known to be unstable. Prior to MongoDB 4.4, find often provided, but did not guarantee, stable results.

      DIAGNOSIS AND AFFECTED VERSIONS

      All versions of MongoDB are affected by this issue, with impact being more likely in MongoDB 4.4.

      If a blocking sort operation sorts on a non-unique field, unstable sort results can be observed using .limit() and .skip():

      db.test_col.find().sort({ counter: -1 }).skip(0).limit(5)
      {  "_id" : ObjectId("5f84bcaafe52900c1be1060e"),  "counter" : 2,  "name" : "2_item" }
      {  "_id" : ObjectId("5f84bcaafe52900c1be10610"),  "counter" : 2,  "name" : "4_item" }
      {  "_id" : ObjectId("5f84bcaafe52900c1be1060c"),  "counter" : 2,  "name" : "0_item" }
      {  "_id" : ObjectId("5f84bcaafe52900c1be1060d"),  "counter" : 2,  "name" : "1_item" }
      {  "_id" : ObjectId("5f84bcaafe52900c1be1060f"),  "counter" : 2,  "name" : "3_item" }
      
      db.test_col.find().sort({ counter: -1 }).skip(5).limit(5)
      {  "_id" : ObjectId("5f84bcaafe52900c1be10610"),  "counter" : 2,  "name" : "4_item" }
      {  "_id" : ObjectId("5f84bcaafe52900c1be1060c"),  "counter" : 2,  "name" : "0_item" }
      {  "_id" : ObjectId("5f84bcaafe52900c1be1060d"),  "counter" : 2,  "name" : "1_item" }
      {  "_id" : ObjectId("5f84bcaafe52900c1be1060f"),  "counter" : 2,  "name" : "3_item" }
      {  "_id" : ObjectId("5f84bcaafe52900c1be10613"),  "counter" : 2,  "name" : "7_item" }
      

      In this example, 0_item, 1_item, 3_item, and 4_item occur in both result sets; the skip and limit values are not enough to create distinct result sets.

      REMEDIATION AND WORKAROUNDS

      Review the 4.4 release notes for information about these changes. For more detail, see this sort stability documentation. In short, one quick fix is to add _id to sorts that require stability between queries.

      After evaluating options, we've unfortunately found that at this time, we cannot completely guarantee sort stability on non-unique values without an unacceptable performance impact. In the absence of a complete guarantee, we aren't able to justify the level of effort that would provide a partial guarantee.

      Original Description

      In 4.4.1, if a duplicate, non-unique value is sorted on and the results are limited and skipped to create groups of results, values in one group of results appear in other groups of results. So performing sorts, limits, and skips for something like pagination will cause results to be repeated in other pages.

       

            Assignee:
            edwin.zhou@mongodb.com Edwin Zhou
            Reporter:
            hartmanj@ainfosec.com Jacob Hartman
            Votes:
            0 Vote for this issue
            Watchers:
            14 Start watching this issue

              Created:
              Updated:
              Resolved: