Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-57011

DocumentStorage caches nested objects for each level of nesting

    • Type: Icon: Bug Bug
    • Resolution: Gone away
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • None
    • Query Execution
    • ALL
    • Hide
      TEST(DocumentSerialization, ApproximateSizeForNestedDocuments) {
          std::string largeStr(1024, 'x');
          auto bsonDoc = BSON("obj" << BSON("subObj" << BSON("subObjSubObj" << largeStr)));
          auto doc = Document(bsonDoc);
          ASSERT_GT(doc.getApproximateSize(), 1024);
          ASSERT_LT(doc.getApproximateSize(), 1024 * 2);
      
          // Force 'obj.subObj.subObjSubObj' to be cached.
          ASSERT_VALUE_EQ(doc.getNestedField("obj.subObj.subObjSubObj"), Value(largeStr));
      
          // largeStr is cached, so expect roughly double the footprint.
          ASSERT_GT(doc.getApproximateSize(), 1024 * 2);
          ASSERT_LT(doc.getApproximateSize(), 1024 * 3);  <--- This one fails, on my machine the reported size is 4892
      }
      
      Show
      TEST(DocumentSerialization, ApproximateSizeForNestedDocuments) { std::string largeStr(1024, 'x'); auto bsonDoc = BSON("obj" << BSON("subObj" << BSON("subObjSubObj" << largeStr))); auto doc = Document(bsonDoc); ASSERT_GT(doc.getApproximateSize(), 1024); ASSERT_LT(doc.getApproximateSize(), 1024 * 2); // Force 'obj.subObj.subObjSubObj' to be cached. ASSERT_VALUE_EQ(doc.getNestedField("obj.subObj.subObjSubObj"), Value(largeStr)); // largeStr is cached, so expect roughly double the footprint. ASSERT_GT(doc.getApproximateSize(), 1024 * 2); ASSERT_LT(doc.getApproximateSize(), 1024 * 3); <--- This one fails, on my machine the reported size is 4892 }
    • Query Execution 2021-06-28, Query Execution 2021-07-12, QE 2022-04-04, QE 2022-04-18, QE 2022-05-02, QE 2022-05-16, QE 2022-05-30, QE 2022-06-13, QE 2022-06-27, QE 2022-07-11, QE 2022-07-25, QE 2022-08-08, QE 2022-08-22

      When accessing a field in a Document, it's expected that the internal caching will add some overhead to the memory footprint. However when the accessed path contains nested documents, it appears that the reported size double counts the values in the sub-objects. The impact may not be incredibly severe, given that it has "approximate" in the method name, but there are several aggregation stages that rely on this size to decide whether to spill to disk.

            Assignee:
            kevin.cherkauer@mongodb.com Kevin Cherkauer
            Reporter:
            nicholas.zolnierz@mongodb.com Nicholas Zolnierz
            Votes:
            0 Vote for this issue
            Watchers:
            12 Start watching this issue

              Created:
              Updated:
              Resolved: