Uploaded image for project: 'Python Driver'
  1. Python Driver
  2. PYTHON-894

collection.aggregate strange behavior

    • Type: Icon: Bug Bug
    • Resolution: Done
    • Priority: Icon: Trivial - P5 Trivial - P5
    • 3.0.1, 2.8.1
    • Affects Version/s: 2.8, 3.0
    • Component/s: None
    • None
    • Environment:
      Python 2.7.6, Python 2.7.9

      If the server returns a cursorId of 0, indicating that the query's results are completely fetched, neither Cursor nor CommandCursor sets its "alive" property to False. This has been true since "alive" was first introduced, and for the whole history of CommandCursor. So code like the following always looped past the end of the result set and raised StopIteration:

      cursor = my_collection.aggregate(
          pipeline,
          cursor={}
      )
      
      while cursor.alive:
          print(cursor.next())  # or next(cursor)
      

      To fix this issue we update Cursor and CommandCursor to set "alive" False when they get an id of 0 from the server.

      However, you still shouldn't iterate with "while cursor.alive", because that loop can still raise StopIteration. If the final batch ends right at a batch_size boundary, then it will have a non-zero cursor id, but the next batch will be empty and have a cursor_id of 0. In this example there are two batches of 2 documents each, and then a final batch with zero documents:

      >>> collection.count()
      4
      >>> cursor = collection.find().batch_size(2)
      >>> cursor.next()
      {u'_id': ObjectId('5531564cca1ce94a93c904a1')}
      >>> len(cursor._Cursor__data)
      1
      >>> cursor.cursor_id
      30137392801
      >>> cursor.next()
      {u'_id': ObjectId('5531564cca1ce94a93c904a2')}
      >>> len(cursor._Cursor__data)
      0
      >>> cursor.cursor_id
      30137392801
      >>> cursor.next()
      {u'_id': ObjectId('5531564cca1ce94a93c904a3')}
      >>> len(cursor._Cursor__data)
      1
      >>> cursor.cursor_id
      30137392801
      >>> cursor.next()
      {u'_id': ObjectId('5531564cca1ce94a93c904a4')}
      >>> cursor.cursor_id
      30137392801
      >>> cursor.next()
      Traceback (most recent call last):
        File "<stdin>", line 1, in <module>
        File "pymongo/cursor.py", line 983, in __next__
          raise StopIteration
      StopIteration
      

      So, even though the cursor has an id when we call next(), it raises StopIteration because the server sends a final, empty batch.

      In conclusion, just use a for loop:

      for document in cursor:
          print(document)
      

      This is perfectly safe with Cursors and CommandCursors.

            Assignee:
            jesse@mongodb.com A. Jesse Jiryu Davis
            Reporter:
            woozyking Leo Li
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              Resolved: