-
Type: Bug
-
Resolution: Done
-
Priority: Major - P3
-
None
-
Affects Version/s: None
-
Component/s: Aggregation Framework
-
None
-
Environment:MongoDB 5.0.6
PyMongo 4.0.1
-
ALL
A variety of formats require strict adherence to the sequence of fields, such as bioinformatics
Files of such formats are often very large and contain nested structures, so it is convenient to use them as collections. But to keep the data belonging to the above specs, it is necessary to keep the arrangement of the fields. Unfortunately, aggregations with saving results to another DB lose original arrangement.
Source document example:
{ "_id": { "$oid": "620fe1e87fd143aebe55bad4" }, "#CHROM": 1, "POS": 88619, "ID": "rs573217706", "REF": "G", "ALT": ["A", "T"], "QUAL": ".", "FILTER": ".", "INFO": [{ "RS": 573217706, "RSPOS": 88619, "dbSNPBuildID": 142, "SSR": 0, "SAO": 0, "VP": "0x050100000005040026000100", "WGT": 1, "VC": "SNV", "CAF": [{ "$numberDecimal": "0.9988" }, ".", { "$numberDecimal": "0.001198" }], "COMMON": 1, "TOPMED": [{ "$numberDecimal": "0.99959384556574923" }, { "$numberDecimal": "0.00000796381243628" }, { "$numberDecimal": "0.00039819062181447" }] }, ["SLO", "ASP", "VLD", "KGPhase3"] ] }
Part of the aggregation pipeline:
{'$merge': {'into': {'db': 'test_out', 'coll': 'common_all.vcf'}}}
Result:
{ "_id": { "$oid": "620fe1e87fd143aebe55bad4" }, "#CHROM": 1, "ALT": ["A", "T"], "FILTER": ".", "ID": "rs573217706", "INFO": [{ "RS": 573217706, "RSPOS": 88619, "dbSNPBuildID": 142, "SSR": 0, "SAO": 0, "VP": "0x050100000005040026000100", "WGT": 1, "VC": "SNV", "CAF": [{ "$numberDecimal": "0.9988" }, ".", { "$numberDecimal": "0.001198" }], "COMMON": 1, "TOPMED": [{ "$numberDecimal": "0.99959384556574923" }, { "$numberDecimal": "0.00000796381243628" }, { "$numberDecimal": "0.00039819062181447" }] }, ["SLO", "ASP", "VLD", "KGPhase3"] ], "POS": 88619, "QUAL": ".", "REF": "G" }