Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-33639

Concurrent writes against non-existent database can fail due to distlock acquisition timeout at `createDatabase` time

    • Type: Icon: Bug Bug
    • Resolution: Fixed
    • Priority: Icon: Major - P3 Major - P3
    • 3.6.6, 4.0.0-rc1, 4.1.1
    • Affects Version/s: 3.6.3, 3.7.2
    • Component/s: Sharding
    • None
    • Fully Compatible
    • v4.0
    • Sharding 2018-05-21, Sharding 2018-06-04
    • 0

      Starting with MongoDB 3.6.0, the creation of sharded databases was made explicit from the point of view of MongoS and the creation logic was moved to the config server. Since the default distributed lock acquisition timeout is still 20 seconds, this causes timeouts when large number of threads suddenly try to write against a database, which does not exist.

      What happens is a convoying effect on the -movePrimary distributed lock, which times out and fails writes even though the database is already created. I am able to reproduce this problem 100% using the load phase of the YCSB benchmark with 40 threads.

      In order to avoid this effect, before taking the distributed lock, we should take some form of lock manager X lock, like with the other metadata commands after which we should check the database for existence before taking the distributed lock, in order to mitigate the convoying effect.

            Assignee:
            janna.golden@mongodb.com Janna Golden
            Reporter:
            kaloian.manassiev@mongodb.com Kaloian Manassiev
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

              Created:
              Updated:
              Resolved: