Uploaded image for project: 'Python Driver'
  1. Python Driver
  2. PYTHON-1517

insert_many should work with arbitrarily long iterables

    • Type: Icon: New Feature New Feature
    • Resolution: Unresolved
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: API
    • Environment:
      Verified on Windows but this does not read as platform-specific

      The current documentation in http://api.mongodb.com/python/current/examples/bulk.html states that "A batch of documents can be inserted by passing a list to the insert_many() method. PyMongo will automatically split the batch into smaller sub-batches based on the maximum message size accepted by MongoDB, supporting very large bulk insert operations.".
      I have a simple generator that generates dictionaries from bytes using JSON.load that I pass to the insert_many method of the pymongo.Collection class. This will ultimately lead to a MemoryError for large objects that always gets thrown at line 741 of https://github.com/mongodb/mongo-python-driver/blob/master/pymongo/collection.py.
      I'm not an expert Python programmer, but after digging a bit into the code it seems to me that because of that line all contents will be expanded prior to the start of the bulk insertion process and thus not taking into consideration the size of individual documents to properly split these into batches.

            Assignee:
            Unassigned Unassigned
            Reporter:
            falmeida Fernando Almeida
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated: