Uploaded image for project: 'Python Driver'
  1. Python Driver
  2. PYTHON-1721

GridFS should use a cursor to read all chunks in a file

    • Type: Icon: Improvement Improvement
    • Resolution: Fixed
    • Priority: Icon: Major - P3 Major - P3
    • 3.8
    • Affects Version/s: None
    • Component/s: GridFS
    • None
    • Minor Change

      In the retryable reads POC (PYTHON-1674) there are GridFS tests that use command monitoring events to assert the expected behavior. The tests assume that a driver issues a single find command to read all chunks in a file, like this:

      for chunk in chunks.find({"files_id": file_id}, sort=[("n", 1)]):
          # process chunk...
      

      However, PyMongo actually runs a new find_one to read each chunk, similar to this:

      for chunk_number in range(total_chunks):
          chunk = chunks.find_one({"files_id": file_id, "n": chunk_number})
          # process chunk...
      

      (The above is a simplification, the real implementation is here in the GridOut class.)

      Now we could change PyMongo's implementation to cache a cursor and reuse it to iterate over all the chunks to match the spec tests. This is also how the GridFS spec itself suggests to read the chunks for a file:

      Drivers must first retrieve the files collection document for this file. If there is no files collection document, the file either never existed, is in the process of being deleted, or has been corrupted, and the driver MUST raise an error.

      Then, implementers retrieve all chunks with files_id equal to id, sorted in ascending order on “n”.

      My question is why do we use many find_one's instead of a single find? Is it to avoid complications arising from cursor errors like CursorNotFound?

      Note that since the default chunk size is 255KB, I expect that using a single find will be much more performant than many find_one's because many chunks can fit in a single find/getMore response.

            Assignee:
            shane.harvey@mongodb.com Shane Harvey
            Reporter:
            shane.harvey@mongodb.com Shane Harvey
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated:
              Resolved: