Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-7161

Sharding will fail with non obvious error when locks collection is not consistent

    • Type: Icon: Bug Bug
    • Resolution: Duplicate
    • Priority: Icon: Minor - P4 Minor - P4
    • None
    • Affects Version/s: 2.0.7
    • Component/s: Sharding
    • None
    • ALL

      If a locks collection is inconsistent across three config servers, shards will fail to be balanced. In the case where one shard has the lock entry but one or both of the others do not, a message similar to the following will appear in the logs:

      [Balancer] caught exception while doing balance: distributed lock balancer/ip-<ip>:<port>:1347910582:1804289383 had errors communicating with individual server <server>:<port> :: caused by :: field not found, expected type 7
      

      expected type 7 refers to the ObjectId that is missing from the locks collection within the affect shard key.

      The balancer lock will be forced when it times out, with the following messages:

      [Balancer] forcing lock 'balancer/ip-<ip>:<port>:1347690492:1804289383' because elapsed time 900364 > takeover time 900000
      [Balancer] warning: lock forcing balancer/ip-<ip>:<port>:1347690492:1804289383 inconsistent
      [Balancer] lock 'balancer/ip-<ip>:<port>:1347690492:1804289383' successfully forced
      

      Which indicates the locks is both successfully forced and inconsistent. However, no shard balancing will take place.

      Potentially the message could be more obvious ("Lock not found"), or the Lock should be successfully forced as reported.

            Assignee:
            greg_10gen Greg Studer
            Reporter:
            andre.defrere@mongodb.com Andre de Frere
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: