Uploaded image for project: 'Ruby Driver'
  1. Ruby Driver
  2. RUBY-676

New write operation method for insert, update, remove

    • Type: Icon: New Feature New Feature
    • Resolution: Done
    • Priority: Icon: Major - P3 Major - P3
    • 1.10.0
    • Affects Version/s: None
    • Component/s: Public API

      1. New Write Operations
        1. Status
          Old API fully working with new write commands, including new batch_write_incremental
          New fluent batch API about to be started
          Sequence of pull requests in progress
          1. Benchmarks

      insert_documents - old batch implementation

      • max_wire_version:0 - 1 serialize-call/document at high-level
        batch_write_partition - new implementation with batch size adjusted for success
      • max_wire_version:0 - 1 serialize-call/document at high-level
      • max_wire_version:2 - 1 serialize-call/batch-insertion attempt
        batch_write_incremental - new implementation - improved incremental
      • 1 serialize-call/doc at high-level, new code
      • 1 serialize-call/doc at high-level, new code with BSON grow

      secs:2.94, docs_per_sec:17493, max_wire_version:0, title:"insert_documents huge w:1"
      secs:1.34, docs_per_sec:38379, max_wire_version:0, title:"batch_write_partition huge w:1"
      secs:0.99, docs_per_sec:51947, max_wire_version:2, title:"batch_write_partition huge w:1"
      secs:2.16, docs_per_sec:23809, max_wire_version:0, title:"batch_write_incremental huge w:1"
      secs:2.47, docs_per_sec:20821, max_wire_version:2, title:"batch_write_incremental huge w:1"

        1. Pending - In progress

      pull requests submitted - peer review pending

      • BSON::Grow#clear! clears internal state to allow reuse
      • pre-serialized bson for DB#command

      TO DO

      • pull requests to be submitted
      • send_write_operation
      • check opts sent to server
      • batch_write_incremental
      • split insert into send_write_command and batch_write_incremental
      • return values

      nightly 2013-10-16 and 2.5.3

      • ordered is not optional (documentation says that it's optional) - errCode: '99999'; errMsg: 'missing ordered field'
      • update top:0 and top:-1 only update one document
      • test_multi_update
      • w:0 - diverted to OP_INSERT, TODO - work into batch insertion
      • test_remove_return_value
      • check_keys - filtered out by version for now - TODO - review
      • test_update_check_keys
        1. Completed
          but to be reviewed again
      • writeConcern
      • ordered = !continue_on_error
      • BATCH_SIZE_LIMIT
      • collect_on_error
      • write_command should go to primary - yes, since not in SECONDARY_OK_COMMANDS
        1. Jira tickets
        1. References
        1. Features

      1. bulk
      2. continue on error mode
      3. stats from each operation run (so you say continue on error and see which writes worked)
      4. write concern built in (no more gle)

        1. References
      1. mongod startup

      The following is no longer needed as of nightly 2013-10-01

      mongod --setParameter enableExperimentalWriteCommands=true

      1. Ruby Interface
        1. Prototype Ruby Write Operations interface (by Gary)
          1. insert

      Mongo::Collection#insert(doc_or_docs, opts={})
      was
      Mongo::Collection#insert(doc_or_docs, opts={})

      examples

      collection.insert(doc, :j => true)

      collection.insert(docs, :j => true, :continue_on_error => true, :collect_on_error => true)

            1. insert_documents

      Mongo::Collection#insert_documents(documents, collection_name=@name, check_keys=true, write_concern={}, flags={})

          1. update

      Mongo::Collection#update(selector_or_updates, document_or_nil=nil, opts={})
      was
      Mongo::Collection#update(selector, document, opts={})

      examples

      collection.update({:n => 1}, { => 2}, :upsert => true, :multi => true, :j => true)

      collection.update([
      {:q => {:n => 2}, :u => {:n => 2, => 4}, :upsert => true, :multi => true},
      {:q => {:n => 3}, :u => {:n => 3, => 9}, :upsert => true}
      ],
      :j => true, :continue_on_error => true, :collect_on_error => true)

      This exposes keys :q and :u to the user.
      An alternative for the bulk/batch parameter would be to have each array element in the form [ query, update, opts ],
      but this is more cumbersome than having each array element be a hash with key :q for the query and :u for update.

          1. delete

      Mongo::Collection#remove(selector_or_deletes={}, opts={})
      was
      Mongo::Collection#remove(selector={}, opts={})

      examples

      collection.remove({:expire => {"$lte" => Time.now}}, :j => true)

      collection.remove([
      {:q => {:n => 1}, :limit => 1},
      {:q => {:n =>

      {"$gt" => 2}

      }}
      ],
      :j => true, :continue_on_error => true, :collect_on_error => true)

      This exposes key :q to the user.
      An alternative for the bulk/batch parameter would be to have each array element in the form [ query, opts ],
      but this is more cumbersome than having each array element be a hash with key :q for the query.

        1. Internals

      Mongo::Collection#insert_documents(documents, collection_name=@name, check_keys=true, write_concern={}, flags={})

        1. Comments

      As known, the update operation is the most complex.
      The new update operation has non-trivial options, now at two levels.
      For a bulk/batch operation, the top level has the common write concern and continue on error options,
      while the inner level now has the upsert and multi options.

      The delete operation also has options at two levels.
      For a bulk/batch operation, the top level has the common write concern options,
      while the inner level has the new limit option.

      The user must explicitly specify options for the bulk/batch operations.
      The driver does not supply any inherited semantics for the inner options.
      Top level write concern options are inherited as previously specified and implemented in the Ruby API.

      Ruby does not implement the remove just_one option.
      The wire protocol has the SingleRemove flag for this function.
      We need to develop a Ruby API for this function.

        1. Integration

      As the new write operations are the future, the first approach would be to design with them as the core methods,
      with fall-back to the old insert/update/delete operations.
      The new write operations document a common core, so

      The current low-level operations are:

      Mongo::Collection#insert_batch(message, documents, write_concern, continue_on_error, errors, collection_name=@name)
      Mongo::Collection#update(selector, document, opts={})
      Mongo::Collection#remove(selector={}, opts={})

      The new write operations are presented with a common core in the documentation.
      This directs us to refactor to a common method for write operations #send_write_operation.

          1. Method #insert_documents

      Refactoring is complicated by #insert_documents which is used twice and has a collection argument.
      Simplify this by replacing the call in #generate_indexes with a call to a lower-level method.
      Methods #insert_batch and #insert_buffer are used only in #insert_documents.

      Insert calling sequence relevant to #insert_documents - there are no calls outside this chain after fixing #generate_indexes
      #save
      #insert
      #insert_documents
      #insert_buffer
      #insert_batch
      #send_insert_message # see also collection_test.rb:328

      After much research and experimentation, the adaptive strategy for sizing write command is MIMD
      (multiplicative increase and multiplicative decrease).
      The decrease factor is -(1/2), corresponding to binary reduction (halving) for each attempt.
      The increase factor is 2**(1/10) so that ten successful attempts corresponds to doubling.
      The initial batch size is number of documents for first call.
      This is probably better than 100, which gives 100x improvement over single writes, but reduction to 1 in about 7 attempts.

      multiplicative increase / multiplicative decrease
      -------------------------------------------------
      initialize
      x = documents.size
      multiplicative increase: x = 2*(1/10)
      x = [(x * 1097) >> 10, x + 1].max unless documents.empty?
      multiplicative decrease: x = 2*(-1)
      x = [x >> 1, 1].max if failure

        1. Issues and remaining work items

      What happens with a replica set containing a mix of server versions, where some nodes support the new write command and others do not?
      We use the minimum of the maxWireVersion numbers, but what if all the nodes don't report in?

      Check_keys is all or nothing, so any operator will force check_keys to be turned off for the rest of the BSON message.
      Review check_keys

      Underscore versus camelcase - document structure for batch requests is exposed, including some options that are camelcase.

      Options need to be processed (rationalized, screened, translated) as appropriate.

      Return values need to be examined carefully, along with the responses that are incorporated from the server.

      Documentation of new batch insert, remove, and update.

      Screening of proper usage for batch insert, remove, and update.

            Assignee:
            gjmurakami Gary Murakami
            Reporter:
            barrie Barrie Segal
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              Resolved: