Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-56116

Balancing failed when moving big collection

    • Type: Icon: Bug Bug
    • Resolution: Cannot Reproduce
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: Sharding
    • None
    • ALL

      I like to archive my data as described in Tiered Hardware for Varying SLA or SLO

      My sharded cluster looks like this:

      db.getSiblingDB("config").shards.find({}, { tags: 1 })
      { "_id" : "shard_01", "tags" : ["recent"] }
      { "_id" : "shard_02", "tags" : ["recent"] }
      { "_id" : "shard_03", "tags" : ["recent"] }
      { "_id" : "shard_04", "tags" : ["archive"] }
      
      db.getSiblingDB("config").collections.find({ _id: "data.sessions.20210412.zoned" }, { key: 1 })
      {
         "_id": "data.sessions.20210412.zoned",
         "key": { "tsi": 1.0, "si": 1.0 }
      }
      
      db.getSiblingDB("data").getCollection("sessions.20210412.zoned").getShardDistribution()
      
      Shard shard_03 at shard_03/d-mipmdb-sh1-03.swi.srse.net:27018,d-mipmdb-sh2-03.swi.srse.net:27018
       data : 63.18GiB docs : 16202743 chunks : 2701
       estimated data per chunk : 23.95MiB
       estimated docs per chunk : 5998
      
      Shard shard_02 at shard_02/d-mipmdb-sh1-02.swi.srse.net:27018,d-mipmdb-sh2-02.swi.srse.net:27018
       data : 55.6GiB docs : 14259066 chunks : 2367
       estimated data per chunk : 24.05MiB
       estimated docs per chunk : 6024
      
      Shard shard_01 at shard_01/d-mipmdb-sh1-01.swi.srse.net:27018,d-mipmdb-sh2-01.swi.srse.net:27018
       data : 68.92GiB docs : 23896624 chunks : 3034
       estimated data per chunk : 23.26MiB
       estimated docs per chunk : 7876
      
      Totals
       data : 187.72GiB docs : 54358433 chunks : 8102
       Shard shard_03 contains 33.66% data, 29.8% docs in cluster, avg obj size on shard : 4KiB
       Shard shard_02 contains 29.62% data, 26.23% docs in cluster, avg obj size on shard : 4KiB
       Shard shard_01 contains 36.71% data, 43.96% docs in cluster, avg obj size on shard : 3KiB
      
      

      In order to trigger migration I use

      sh.disableBalancing('data.sessions.20210412.zoned')
      if (db.getSiblingDB("config").migrations.findOne({ ns: 'data.sessions.20210412.zoned' }) == null) {
         sh.updateZoneKeyRange('data.sessions.20210412.zoned', { "tsi": MinKey, "si": MinKey }, { "tsi": MaxKey, "si": MaxKey }, null)
         sh.updateZoneKeyRange('data.sessions.20210412.zoned', { "tsi": MinKey, "si": MinKey }, { "tsi": MaxKey, "si": MaxKey }, 'archive')
      }
      sh.enableBalancing('data.sessions.20210412.zoned')
      
      

      I don't get any error and migration starts. However, in my logs (at config server) I get thousands or even millions of these warnings:

      {
        "t": {
          "$date": "2021-04-15T14:56:28.984+02:00"
        },
        "s": "W",
        "c": "SHARDING",
        "id": 21892,
        "ctx": "Balancer",
        "msg": "Chunk violates zone, but no appropriate recipient found",
        "attr": {
          "chunk": "{ ns: \"data.sessions.20210412.zoned\", min: { tsi: \"194.230.147.157\", si: \"10.38.15.1\" }, max: { tsi: \"194.230.147.157\", si: \"10.40.230.198\" }, shard: \"shard_03\", lastmod: Timestamp(189, 28), lastmodEpoch: ObjectId('60780e581ad069faafa363ba'), jumbo: false }",
          "zone": "archive"
        }
      }
      

      The file system reached 100% and MongoDB stopped working.

      How can this be? `MinKey` / `MayKey` should cover all values.

       

       

       

       

       

            Assignee:
            eric.sedor@mongodb.com Eric Sedor
            Reporter:
            wernfried.domscheit@sunrise.net Wernfried Domscheit
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated:
              Resolved: