Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-17174

Updates can fail to find document during high concurrency

    • Type: Icon: Bug Bug
    • Resolution: Duplicate
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: 2.6.5, 2.6.7, 3.0.0-rc7
    • Component/s: Concurrency, Write Ops
    • None
    • Fully Compatible
    • ALL
    • Hide

      See the attached files:

      export MONGO=`which mongo`
      export MONGOD=`which mongod`
      
      ./reset.sh
      # it may be necessary to run the following command a few times.
      ./run-js.sh
      
      Show
      See the attached files: export MONGO=`which mongo` export MONGOD=`which mongod` ./reset.sh # it may be necessary to run the following command a few times. ./run-js.sh

      With a cold start start certain 'update' commands fail when they are expected to succeed.

      Scenario can be triggered by cold-starting mongod and having multiple threads/processes update the same document.

      Have reproduced this on both OSX and Ubuntu using the attached scripts:

      • reset.sh : starts an empty mongod and inserts 10 documents containing just an _id field
      • run-js.sh: cold starts mongod and sets log level 1 (via prep.sh script). Then it starts 10 subprocesses, each one updating the same document via $addToSet.

      The run-js.sh script may need to be run a few times. Each time it targets a single document and tries to blast it with multiple concurrent updates. It will cycle through all 10 documents at a rate of one a second. The fact that it typically doesn't occur during the first run might indicate the problem is exposed more easily when dealing with larger objects.

      Under certain circumstances some of these updates fail; the update however should have succeeded and can be manually applied later on.

      The common characteristic seems to be:

      • nMatched:0 and nModified:0
      • numYields:1

      The nscanned metrics in each case show the document was found. However, the operation yields (numYields:1) after which the result is nMatched:0 and nModified:0, ie the update has not been applied.

      These particular operations also seem to hold the 'w' lock for a lot less time than those which succeed. This is probably to be expected given the behavior but is another key indicator of the problem.

      I thought this might have been due to document moves, ie this document is found but the operation yields before the update is applied; during this time another update comes in and successfully updates the document, moving it in the process, such that when the yielded operation resumes it cannot find the document. This does not appear to be borne out by the available evidence, i.e., I have seen many circumstances where there are no document moves taking place and yet the problem persists.

      Note too: it is not sufficient for the update operation to yield - there are occasions where the operation does yield and the update still completes successfully.

      I have also tried to force the document into memory before the update by issuing a 'findOne()' and then the 'update()' but I still see the problem.

      FYI I also have a slightly more complex repro written in Java which can switch between the standard Update command, the Bulk Update API and 'findAndModify()' - both Update methods suffer this problem but 'findAndModify' does not appear to. Interestingly the 'findAndModify()' does not yield - so perhaps that's a clue. (I can upload this Java version too if required).

      Log file snippet showing problem:

      2015-02-04T12:01:22.971+0000 [conn9] update foo.bar query: { _id: 1, c.sb.name: { $ne: "Update::1::18::a158ac61-35d9-45c8-9412-f2733f41585c" } } update: { $addToSet: { c.sb: { name: "Update::1::18::a158ac61-35d9-45c8-9412-f2733f41585c" } } } nscanned:1 nscannedObjects:1 nMatched:0 nModified:0 keyUpdates:0 numYields:1 locks(micros) w:2099 14ms
      

      Counter example showing an update which yields and yet succeeds:

      2015-02-04T12:00:03.260+0000 [conn4] update foo.bar query: { _id: 6, c.sb.name: { $ne: "Update::6::221::d2b6ab90-cd2a-4dd5-b246-c06276d33768" } } update: { $addToSet: { c.sb: { name: "Update::6::221::d2b6ab90-cd2a-4dd5-b246-c06276d33768" } } } nscanned:1 nscannedObjects:1 nMatched:1 nModified:1 keyUpdates:0 numYields:1 locks(micros) w:4004 8ms
      

      I have tested this on MongoDB 2.6.5, 2.6.7 and 3.0.0-rc7 - all are affected.

      I have tested my java version on 2.4.12 and it does not appear to be affected, indicating this is a regression.

        1. prep.sh
          0.6 kB
        2. reset.sh
          0.5 kB
        3. repro.js
          1 kB
        4. run-js.sh
          0.3 kB
        5. mongod.snippet.log
          2.73 MB
        6. repro2.js
          1 kB
        7. run-js2.sh
          0.4 kB

            Assignee:
            Unassigned Unassigned
            Reporter:
            ronan.bohan@mongodb.com Ronan Bohan
            Votes:
            1 Vote for this issue
            Watchers:
            21 Start watching this issue

              Created:
              Updated:
              Resolved: