-
Type: Bug
-
Resolution: Fixed
-
Priority: Major - P3
-
Affects Version/s: 4.4.3, 4.2.12, 4.9.0-alpha4
-
Component/s: None
-
Minor Change
-
ALL
-
v6.0
-
QO 2022-04-04, QO 2022-04-18, QO 2022-05-02, QO 2022-05-16
SERVER-27644 and SERVER-40134 already exposed differences in behaviour between running a distinct command on a collection vs. running it on a view. The aggregation pipeline created internally to run distinct on a view does not cover all the use-cases.
While investigating the behaviour as part of DRIVERS-1486 I found some more differences.
To test, I've created a collection with data and created a view based on an empty pipeline:
getDocument = function (id, value) { return { _id: id, x: value, nested: { x: value, nested: { x: value }, array: [value] }, array: [value], documentArray: [ {x: value}, {x: value << 2} ] }; } db.distinctTest.drop(); db.distinctTest.insertMany([ getDocument(1, 1), getDocument(2, 2), getDocument(3, 3), getDocument(4, 1) ]); db.createView('distinctViewTest', 'distinctTest', []);
I've come up with the following calls to distinct:
db.distinctTest.distinct('x'); db.distinctTest.distinct('nested.x'); db.distinctTest.distinct('nested.nested.x'); db.distinctTest.distinct('array'); db.distinctTest.distinct('nested.array'); db.distinctTest.distinct('documentArray'); db.distinctTest.distinct('documentArray.x'); db.distinctTest.distinct('documentArray[1].x'); db.distinctTest.distinct('documentArray.1.x');
Most of the cases look the same (except for different result ordering which we can ignore), but the last case differs:
MongoDB Enterprise > db.distinctTest.distinct('documentArray.1.x'); [ 4, 8, 12 ] MongoDB Enterprise > db.distinctViewTest.distinct('documentArray.1.x'); [ ]
Looking at the pipeline generated, we can see that documentArray.1.x produces three $unwind stages:
[ { "$unwind" : { "path" : "$documentArray", "preserveNullAndEmptyArrays" : true } }, { "$unwind" : { "path" : "$documentArray.1", "preserveNullAndEmptyArrays" : true } }, { "$unwind" : { "path" : "$documentArray.1.x", "preserveNullAndEmptyArrays" : true } }, { "$match" : { "documentArray" : { "$_internalSchemaType" : "object" }, "documentArray.1" : { "$_internalSchemaType" : "object" } } }, { "$group" : { "_id" : null, "distinct" : { "$addToSet" : "$documentArray.1.x" } } } ]
This is incorrect, as documentArray.1 should not unwind documentArray first, but rather use $arrayElemAt to. This modified aggregation pipeline produces the same result as the corresponding distinct command:
[ { "$set": { "documentArray": { $arrayElemAt: [ "$documentArray", 1 ] } } }, { "$unwind" : { "path" : "$documentArray", "preserveNullAndEmptyArrays" : true } }, { "$unwind" : { "path" : "$documentArray.x", "preserveNullAndEmptyArrays" : true } }, { "$match" : { "documentArray" : { "$_internalSchemaType" : "object" } } }, { "$group" : { "_id" : null, "distinct" : { "$addToSet" : "$documentArray.x" } } } ]
I was able to reproduce this in 4.2.12, 4.4.3, and 4.9.0-alpha4. It is likely that this also affects previous versions which I didn't have on hand to test. SERVER-27644 introduced the $unwind logic and was backported to 3.4, so I expect all versions starting with that being affected.
- is related to
-
DRIVERS-1486 VersionedAPI: Reimplement 'distinct' in 4.9+
- Development Complete
-
SERVER-37715 Use DISTINCT_SCAN for $unwind-$group pipelines
- Backlog
- related to
-
SERVER-27644 distinct on a view doesn't "flatten" array members
- Closed
-
SERVER-40134 Distinct command against a view can return incorrect results when the distinct path is multikey
- Closed