Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-35009

Sharded cluster with small chunk size set makes bulk insert jobs fail to return

    • Type: Icon: Bug Bug
    • Resolution: Duplicate
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: 3.6.4
    • Component/s: Sharding
    • None
    • ALL
    • Hide

      Create & init config server as a single relica set

      mongod --configsvr --replSet CSRS --bind_ip 127.0.0.1 --port 59130 --dbpath ./data/cfg0 -logpath ./logs/cfg0.log --smallfiles --oplogSize 128 --fork --pidfilepath ./run/cfg0.pid
      

      Init replica set

      mongo --host 127.0.0.1 --port 59130 --eval 'rs.initiate({_id:"CSRS",configsvr: true, members: [{_id: 0,host: "127.0.0.1:59130"}]});'
      

      Start mongos

      mongos --bind_ip 127.0.0.1 --port 27017 --configdb CSRS/127.0.0.1:59130 --fork --logpath ./logs/mongos.log --pidfilepath ./run/mongos.pid
      

      Start two shard servers

      mongod --shardsvr --bind_ip 127.0.0.1 --port 59131 --dbpath ./data/d0  --logpath ./logs/do0.log --smallfiles --oplogSize 128 --fork --pidfilepath ./run/d0.pid
      mongod --shardsvr --bind_ip 127.0.0.1 --port 59132 --dbpath ./data/d1  --logpath ./logs/do1.log --smallfiles --oplogSize 128 --fork --pidfilepath ./run/d1.pid
      

      Set chunksize

      mongo --host 127.0.0.1 --port 27017 --eval 'cfg = db.getSiblingDB("config"); cfg.settings.save( { _id:"chunksize", value: 1 } );'
      

      Init sharding

      mongo --host 127.0.0.1 --port 27017 --eval 'sh.addShard("127.0.0.1:59131"); sh.addShard("127.0.0.1:59132");'
      

      We now have a running mongo sharded cluster

      --- Sharding Status ---
        sharding version: {
              "_id" : 1,
              "minCompatibleVersion" : 5,
              "currentVersion" : 6,
              "clusterId" : ObjectId("5afb4fe9d00caef6cbc972d1")
        }
        shards:
              {  "_id" : "shard0000",  "host" : "127.0.0.1:59131",  "state" : 1 }
              {  "_id" : "shard0001",  "host" : "127.0.0.1:59132",  "state" : 1 }
        active mongoses:
              "3.6.0" : 1
        autosplit:
              Currently enabled: yes
        balancer:
              Currently enabled:  yes
              Currently running:  no
              Failed balancer rounds in last 5 attempts:  0
              Migration Results for the last 24 hours:
                      No recent migrations
        databases:
              {  "_id" : "config",  "primary" : "config",  "partitioned" : true }
                      config.system.sessions
                              shard key: { "_id" : 1 }
                              unique: false
                              balancing: true
                              chunks:
                                      shard0000       1
                              { "_id" : { "$minKey" : 1 } } -->> { "_id" : { "$maxKey" : 1 } } on : shard0000 Timestamp(1, 0)
      

      Now if we run this script it will never return or maybe take a very long time, have waited 1 hour before killing it.

      In file load_shard.js

      db = db.getSiblingDB('mydb');
      sh.enableSharding("mydb");
      db.user.ensureIndex({"user_id":1});
      sh.shardCollection("mydb.user",{"user_id":1});
      
      var bulk = db.user.initializeUnorderedBulkOp();
      people = ["Marc", "Bill", "George", "Eliot", "Matt", "Trey", "Tracy", "Greg", "Steve", "Kristina", "Katie", "Jeff"];
      for(var i=0; i<200000; i++){
         user_id = i;
         name = people[Math.floor(Math.random()*people.length)];
         number = Math.floor(Math.random()*10001);
         bulk.insert( { "user_id":user_id, "name":name, "number":number });
      }
      bulk.execute();
      

      Run script

      mongo --host 127.0.0.1 --port 27017 < load_shard.js
      

      Script never returns. If you make another connection to mongos and do a sh.status() it looks like data has been written.

      Show
      Create & init config server as a single relica set mongod --configsvr --replSet CSRS --bind_ip 127.0.0.1 --port 59130 --dbpath ./data/cfg0 -logpath ./logs/cfg0.log --smallfiles --oplogSize 128 --fork --pidfilepath ./run/cfg0.pid Init replica set mongo --host 127.0.0.1 --port 59130 --eval 'rs.initiate({_id:"CSRS",configsvr: true, members: [{_id: 0,host: "127.0.0.1:59130"}]});' Start mongos mongos --bind_ip 127.0.0.1 --port 27017 --configdb CSRS/127.0.0.1:59130 --fork --logpath ./logs/mongos.log --pidfilepath ./run/mongos.pid Start two shard servers mongod --shardsvr --bind_ip 127.0.0.1 --port 59131 --dbpath ./data/d0 --logpath ./logs/do0.log --smallfiles --oplogSize 128 --fork --pidfilepath ./run/d0.pid mongod --shardsvr --bind_ip 127.0.0.1 --port 59132 --dbpath ./data/d1 --logpath ./logs/do1.log --smallfiles --oplogSize 128 --fork --pidfilepath ./run/d1.pid Set chunksize mongo --host 127.0.0.1 --port 27017 --eval 'cfg = db.getSiblingDB("config"); cfg.settings.save( { _id:"chunksize", value: 1 } );' Init sharding mongo --host 127.0.0.1 --port 27017 --eval 'sh.addShard("127.0.0.1:59131"); sh.addShard("127.0.0.1:59132");' We now have a running mongo sharded cluster --- Sharding Status --- sharding version: { "_id" : 1, "minCompatibleVersion" : 5, "currentVersion" : 6, "clusterId" : ObjectId("5afb4fe9d00caef6cbc972d1") } shards: { "_id" : "shard0000", "host" : "127.0.0.1:59131", "state" : 1 } { "_id" : "shard0001", "host" : "127.0.0.1:59132", "state" : 1 } active mongoses: "3.6.0" : 1 autosplit: Currently enabled: yes balancer: Currently enabled: yes Currently running: no Failed balancer rounds in last 5 attempts: 0 Migration Results for the last 24 hours: No recent migrations databases: { "_id" : "config", "primary" : "config", "partitioned" : true } config.system.sessions shard key: { "_id" : 1 } unique: false balancing: true chunks: shard0000 1 { "_id" : { "$minKey" : 1 } } -->> { "_id" : { "$maxKey" : 1 } } on : shard0000 Timestamp(1, 0) Now if we run this script it will never return or maybe take a very long time, have waited 1 hour before killing it. In file load_shard.js db = db.getSiblingDB('mydb'); sh.enableSharding("mydb"); db.user.ensureIndex({"user_id":1}); sh.shardCollection("mydb.user",{"user_id":1}); var bulk = db.user.initializeUnorderedBulkOp(); people = ["Marc", "Bill", "George", "Eliot", "Matt", "Trey", "Tracy", "Greg", "Steve", "Kristina", "Katie", "Jeff"]; for(var i=0; i<200000; i++){ user_id = i; name = people[Math.floor(Math.random()*people.length)]; number = Math.floor(Math.random()*10001); bulk.insert( { "user_id":user_id, "name":name, "number":number }); } bulk.execute(); Run script mongo --host 127.0.0.1 --port 27017 < load_shard.js Script never returns. If you make another connection to mongos and do a sh.status() it looks like data has been written.

      There is a problem running Mongo 3.6.* in a test sharded clustered, where the chunk size is set small as in 1 or 2M.
      Doing a bulk insert on a shard enabled database using a script piped into mongo shell makes the job stuck and never returns.
      Increasing the chunk size or reducing the amount of data fixes it.
      This use to work under Mongo 2.6.7

      The problem can be created with the simplest of set up one config & mongos server and two shard servers. It problem still exits though,
      even if you have a much larger cluster with multiple replica sets with multiple shard servers.

      have tried doing this on version 3.6.0. & 3.6.4 but same results

            Assignee:
            ramon.fernandez@mongodb.com Ramon Fernandez Marina
            Reporter:
            royce55 Royce Brown
            Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

              Created:
              Updated:
              Resolved: