Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-70106

Buildfest feedback: $merge is slow vs insert

    • Type: Icon: Bug Bug
    • Resolution: Works as Designed
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: 6.0.1
    • Component/s: None
    • ALL
    • Hide
      let mut vec = Vec::new();
      for n in 0..10000 {
          let new_doc = doc! {
              "title": "T", "year": 2020, "plot": "plot description",
          };
          vec.push(new_doc);
      }
      all.insert_many(vec, None).unwrap();
      // drop collection
      let pipeline = vec![
          doc! { "$documents": [ { "dens": 0 } ] },
          doc! { "$densify": {
                  "field": "dens",
                  "range": { "step": 1, "bounds": [0, 10000]}
              }},
          doc! { "$addFields":  doc! {
                  "title": "T",
                  "year": 2020,
                  //"rand": {"$rand": {} },
                  "_id": {"$rand": {} },
              }},
          // doc! { "$out": "all" },
          doc! { "$merge": "all" },
      ];
      db.aggregate(pipeline, None).unwrap(); 
      Show
      let mut vec = Vec:: new (); for n in 0..10000 { let new_doc = doc! { "title" : "T" , "year" : 2020, "plot" : "plot description" , }; vec.push(new_doc); } all.insert_many(vec, None).unwrap(); // drop collection let pipeline = vec![ doc! { "$documents" : [ { "dens" : 0 } ] }, doc! { "$densify" : { "field" : "dens" , "range" : { "step" : 1, "bounds" : [0, 10000]} }}, doc! { "$addFields" : doc! { "title" : "T" , "year" : 2020, // "rand" : { "$rand" : {} }, "_id" : { "$rand" : {} }, }}, // doc! { "$out" : "all" }, doc! { "$merge" : "all" }, ]; db.aggregate(pipeline, None).unwrap();
    • QO 2022-10-17

      I was populating an empty (dropped) collection with semi-random data. Creating 10k records in Rust and using insert_many took 748ms. Using $densify and $out to create similar records took 565ms. Using $merge instead of $out took 91,983ms. It seems that $merge should at least be faster than insert_many.

      I recall that setting the _id field in the pipeline prior to $merge improved performance, but I can no longer reproduce this.

        1. experiment.js
          2 kB
          Alya Berciu

            Assignee:
            alya.berciu@mongodb.com Alya Berciu
            Reporter:
            maxim.katcharov@mongodb.com Maxim Katcharov
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

              Created:
              Updated:
              Resolved: