Uploaded image for project: 'C Driver'
  1. C Driver
  2. CDRIVER-788

Hang in large bulk upsert

    • Type: Icon: Bug Bug
    • Resolution: Done
    • Priority: Icon: Major - P3 Major - P3
    • 1.2-beta1
    • Affects Version/s: None
    • Component/s: Bulk API, libmongoc
    • None
    • Environment:
      Solaris 11.

      On Solaris 11 with MongoDB 2.4.14 and C Driver 1.2 unreleased, "test_upsert_large" segfaults:

      1. The test constructs an update document that is intended to exactly meet the 16MB max bson size, like update({_id: 1}, {$set: {x: <... 16777179-byte string ...>}}).

      2. On legacy servers, it is sent as an OP_UPDATE in _mongoc_write_command_update_legacy, eventually via mongoc_cluster_sendv_to_server

      3. mongoc_cluster_sendv_to_server calls mongoc_stream_writev.

      4. mongoc_stream_writev eventually results in a standard sendmsg call which fails with errno 97, EMSGSIZE, "Message too long" . http://docs.oracle.com/cd/E19455-01/806-1075/msgs-1643/index.html

      5. mongoc_cluster_sendv_to_server incorrectly checks mongoc_stream_writev's error return: it considers -1 a success. This is part of the CDRIVER-756 class of bugs.

      6. mongoc_cluster_sendv_to_server thinks the call succeeded so it blocks the standard sockettimeoutms of 5 minutes awaiting GLE. When it finally decides the GLE has failed it crashes trying to free the NULL response document CDRIVER-787.

      Questions:

      1. Does CDRIVER-756 already cover the bug in step 3?

      2. What is a reasonable approach to EMSGSIZE? Split the iovec and retry? Are we certain none of the message was sent? Should the driver record that "n bytes was too large" and split all future iovecs up to that size, in an attempt to adapt to its system?

            Assignee:
            bjori Hannes Magnusson
            Reporter:
            jesse@mongodb.com A. Jesse Jiryu Davis
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: