$merge breaks fields order. It is critical for bioinformatics

XMLWordPrintableJSON

    • Type: Bug
    • Resolution: Done
    • Priority: Major - P3
    • None
    • Affects Version/s: None
    • Component/s: Aggregation Framework
    • None
    • Environment:
      MongoDB 5.0.6
      PyMongo 4.0.1
    • ALL
    • None
    • 3
    • None
    • None
    • None
    • None
    • None
    • None

      A variety of formats require strict adherence to the sequence of fields, such as bioinformatics

      Files of such formats are often very large and contain nested structures, so it is convenient to use them as collections. But to keep the data belonging to the above specs, it is necessary to keep the arrangement of the fields. Unfortunately, aggregations with saving results to another DB lose original arrangement.

      Source document example:

      {
          "_id": {
              "$oid": "620fe1e87fd143aebe55bad4"
          },
          "#CHROM": 1,
          "POS": 88619,
          "ID": "rs573217706",
          "REF": "G",
          "ALT": ["A", "T"],
          "QUAL": ".",
          "FILTER": ".",
          "INFO": [{
                  "RS": 573217706,
                  "RSPOS": 88619,
                  "dbSNPBuildID": 142,
                  "SSR": 0,
                  "SAO": 0,
                  "VP": "0x050100000005040026000100",
                  "WGT": 1,
                  "VC": "SNV",
                  "CAF": [{
                      "$numberDecimal": "0.9988"
                  }, ".", {
                      "$numberDecimal": "0.001198"
                  }],
                  "COMMON": 1,
                  "TOPMED": [{
                      "$numberDecimal": "0.99959384556574923"
                  }, {
                      "$numberDecimal": "0.00000796381243628"
                  }, {
                      "$numberDecimal": "0.00039819062181447"
                  }]
              },
              ["SLO", "ASP", "VLD", "KGPhase3"]
          ]
      }
      

      Part of the aggregation pipeline:

      {'$merge': {'into': {'db': 'test_out', 'coll': 'common_all.vcf'}}}
      

      Result:

       

      {
          "_id": {
              "$oid": "620fe1e87fd143aebe55bad4"
          },
          "#CHROM": 1,
          "ALT": ["A", "T"],
          "FILTER": ".",
          "ID": "rs573217706",
          "INFO": [{
                  "RS": 573217706,
                  "RSPOS": 88619,
                  "dbSNPBuildID": 142,
                  "SSR": 0,
                  "SAO": 0,
                  "VP": "0x050100000005040026000100",
                  "WGT": 1,
                  "VC": "SNV",
                  "CAF": [{
                      "$numberDecimal": "0.9988"
                  }, ".", {
                      "$numberDecimal": "0.001198"
                  }],
                  "COMMON": 1,
                  "TOPMED": [{
                      "$numberDecimal": "0.99959384556574923"
                  }, {
                      "$numberDecimal": "0.00000796381243628"
                  }, {
                      "$numberDecimal": "0.00039819062181447"
                  }]
              },
              ["SLO", "ASP", "VLD", "KGPhase3"]
          ],
          "POS": 88619,
          "QUAL": ".",
          "REF": "G"
      }

            Assignee:
            Eric Sedor
            Reporter:
            Platon workaccount
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              Resolved: