Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-33257

Sync doesn't start when adding a new node

    • Type: Icon: Bug Bug
    • Resolution: Done
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: 3.2.18, 3.2.19
    • Component/s: Replication
    • None
    • ALL

      2 nodes: 3.2.18 and 3.2.19, both with the same config and replica set rs0.

      When adding a node to the replica set:

      rs0:PRIMARY> rs.add({ host: "mongo2.nyc3.wegohealth.com:27017", priority: 0.5 })
      { "ok" : 1 }
      rs0:PRIMARY> cfg = rs.conf()
      {
      	"_id" : "rs0",
      	"version" : 690581,
      	"members" : [
      		{
      			"_id" : 6,
      			"host" : "mongo1.nyc3.wegohealth.com:27017",
      			"arbiterOnly" : false,
      			"buildIndexes" : true,
      			"hidden" : false,
      			"priority" : 1,
      			"tags" : {
      
      			},
      			"slaveDelay" : NumberLong(0),
      			"votes" : 1
      		},
      		{
      			"_id" : 7,
      			"host" : "mongo2.nyc3.wegohealth.com:27017",
      			"arbiterOnly" : false,
      			"buildIndexes" : true,
      			"hidden" : false,
      			"priority" : 0.5,
      			"tags" : {
      
      			},
      			"slaveDelay" : NumberLong(0),
      			"votes" : 1
      		}
      	],
      	"settings" : {
      		"chainingAllowed" : true,
      		"heartbeatIntervalMillis" : 2000,
      		"heartbeatTimeoutSecs" : 10,
      		"electionTimeoutMillis" : 10000,
      		"getLastErrorModes" : {
      
      		},
      		"getLastErrorDefaults" : {
      			"w" : 1,
      			"wtimeout" : 0
      		}
      	}
      }
      

      On the second node in the log file I see the following:

      2018-02-10T06:21:22.318-0500 I CONTROL  [main] ***** SERVER RESTARTED *****
      2018-02-10T06:21:22.323-0500 I CONTROL  [initandlisten] MongoDB starting : pid=7328 port=27017 dbpath=/srv/data/mongo/mongodb 64-bit host=mongo2
      2018-02-10T06:21:22.323-0500 I CONTROL  [initandlisten] db version v3.2.19
      2018-02-10T06:21:22.323-0500 I CONTROL  [initandlisten] git version: a9f574de6a566a58b24d126b44a56718d181e989
      2018-02-10T06:21:22.323-0500 I CONTROL  [initandlisten] OpenSSL version: OpenSSL 1.0.2g  1 Mar 2016
      2018-02-10T06:21:22.323-0500 I CONTROL  [initandlisten] allocator: tcmalloc
      2018-02-10T06:21:22.323-0500 I CONTROL  [initandlisten] modules: none
      2018-02-10T06:21:22.323-0500 I CONTROL  [initandlisten] build environment:
      2018-02-10T06:21:22.323-0500 I CONTROL  [initandlisten]     distmod: ubuntu1604
      2018-02-10T06:21:22.323-0500 I CONTROL  [initandlisten]     distarch: x86_64
      2018-02-10T06:21:22.323-0500 I CONTROL  [initandlisten]     target_arch: x86_64
      2018-02-10T06:21:22.323-0500 I CONTROL  [initandlisten] options: { config: "/etc/mongod.conf", net: { bindIp: "0.0.0.0", port: 27017 }, operationProfiling: { mode: "slowOp", slowOpThresholdMs: 300 }, processManagement: { fork: false }, replication: { replSetName: "rs0" }, storage: { dbPath: "/srv/data/mongo/mongodb", engine: "wiredTiger", journal: { enabled: true }, mmapv1: { smallFiles: true } }, systemLog: { destination: "file", logAppend: true, path: "/srv/data/mongo/log/mongod.log", quiet: true } }
      2018-02-10T06:21:22.339-0500 I STORAGE  [initandlisten] wiredtiger_open config: create,cache_size=8G,session_max=20000,eviction=(threads_min=4,threads_max=4),config_base=false,statistics=(fast),log=(enabled=true,archive=true,path=journal,compressor=snappy),file_manager=(close_idle_time=100000),checkpoint=(wait=60,log_size=2GB),statistics_log=(wait=0),verbose=(recovery_progress),
      2018-02-10T06:21:22.429-0500 I STORAGE  [initandlisten] WiredTiger [1518261682:429075][7328:0x7fa66cebdc80], txn-recover: Main recovery loop: starting at 2/6656
      2018-02-10T06:21:22.479-0500 I STORAGE  [initandlisten] WiredTiger [1518261682:479659][7328:0x7fa66cebdc80], txn-recover: Recovering log 2 through 3
      2018-02-10T06:21:22.480-0500 I STORAGE  [initandlisten] WiredTiger [1518261682:480392][7328:0x7fa66cebdc80], txn-recover: Recovering log 3 through 3
      2018-02-10T06:21:22.558-0500 W STORAGE  [initandlisten] Detected configuration for non-active storage engine mmapv1 when current storage engine is wiredTiger
      2018-02-10T06:21:22.558-0500 I CONTROL  [initandlisten]
      2018-02-10T06:21:22.558-0500 I CONTROL  [initandlisten] ** WARNING: /sys/kernel/mm/transparent_hugepage/enabled is 'always'.
      2018-02-10T06:21:22.558-0500 I CONTROL  [initandlisten] **        We suggest setting it to 'never'
      2018-02-10T06:21:22.558-0500 I CONTROL  [initandlisten]
      2018-02-10T06:21:22.558-0500 I CONTROL  [initandlisten] ** WARNING: /sys/kernel/mm/transparent_hugepage/defrag is 'always'.
      2018-02-10T06:21:22.558-0500 I CONTROL  [initandlisten] **        We suggest setting it to 'never'
      2018-02-10T06:21:22.558-0500 I CONTROL  [initandlisten]
      2018-02-10T06:21:22.559-0500 I REPL     [initandlisten] Did not find local voted for document at startup.
      2018-02-10T06:21:22.559-0500 I REPL     [initandlisten] Did not find local replica set configuration document at startup;  NoMatchingDocument: Did not find replica set configuration document in local.system.replset
      2018-02-10T06:21:22.559-0500 I NETWORK  [HostnameCanonicalizationWorker] Starting hostname canonicalization worker
      2018-02-10T06:21:22.559-0500 I FTDC     [initandlisten] Initializing full-time diagnostic data capture with directory '/srv/data/mongo/mongodb/diagnostic.data'
      2018-02-10T06:21:22.560-0500 I NETWORK  [initandlisten] waiting for connections on port 27017
      2018-02-10T06:21:50.487-0500 I ASIO     [NetworkInterfaceASIO-Replication-0] Connecting to mongo1.nyc3.wegohealth.com:27017
      2018-02-10T06:21:50.488-0500 I ASIO     [NetworkInterfaceASIO-Replication-0] Successfully connected to mongo1.nyc3.wegohealth.com:27017, took 1ms (1 connections now open to mongo1.nyc3.wegohealth.com:27017)
      

      And the synchronization doesn't happen. There is no any data:

      > rs.slaveOk()
      > show databases
      local  0.000GB
      

      Replication config is missing:

      > db.isMaster()
      {
      	"ismaster" : false,
      	"secondary" : false,
      	"info" : "Does not have a valid replica set config",
      	"isreplicaset" : true,
      	"maxBsonObjectSize" : 16777216,
      	"maxMessageSizeBytes" : 48000000,
      	"maxWriteBatchSize" : 1000,
      	"localTime" : ISODate("2018-02-10T11:24:17.697Z"),
      	"maxWireVersion" : 4,
      	"minWireVersion" : 0,
      	"ok" : 1
      }
      > rs.conf()
      2018-02-10T06:24:27.130-0500 E QUERY    [thread1] Error: Could not retrieve replica set config: {
      	"info" : "run rs.initiate(...) if not yet done for the set",
      	"ok" : 0,
      	"errmsg" : "no replset config has been received",
      	"code" : 94
      } :
      rs.conf@src/mongo/shell/utils.js:1217:11
      @(shell):1:1
      

      Even though the master log says the node is added and it exists in the rs config:

      2018-02-10T06:21:50.486-0500 I REPL     [conn2562] replSetReconfig admin command received from client
      2018-02-10T06:21:50.487-0500 I REPL     [conn2562] replSetReconfig config object with 2 members parses ok
      2018-02-10T06:21:50.487-0500 I ASIO     [ReplicationExecutor] dropping unhealthy pooled connection to mongo2.nyc3.wegohealth.com:27017
      2018-02-10T06:21:50.487-0500 I ASIO     [ReplicationExecutor] after drop, pool was empty, going to spawn some connections
      2018-02-10T06:21:50.487-0500 I ASIO     [NetworkInterfaceASIO-Replication-0] Connecting to mongo2.nyc3.wegohealth.com:27017
      2018-02-10T06:21:50.488-0500 I ASIO     [NetworkInterfaceASIO-Replication-0] Successfully connected to mongo2.nyc3.wegohealth.com:27017, took 1ms (1 connections now open to mongo2.nyc3.wegohealth.com:27017)
      2018-02-10T06:21:50.488-0500 I REPL     [ReplicationExecutor] New replica set config in use: { _id: "rs0", version: 690581, members: [ { _id: 6, host: "mongo1.nyc3.wegohealth.com:27017", arbiterOnly: false, buildIndexes: true, hidden: false, priority: 1.0, tags: {}, slaveDelay: 0, votes: 1 }, { _id: 7, host: "mongo2.nyc3.wegohealth.com:27017", arbiterOnly: false, buildIndexes: true, hidden: false, priority: 0.5, tags: {}, slaveDelay: 0, votes: 1 } ], settings: { chainingAllowed: true, heartbeatIntervalMillis: 2000, heartbeatTimeoutSecs: 10, electionTimeoutMillis: 10000, getLastErrorModes: {}, getLastErrorDefaults: { w: 1, wtimeout: 0 } } }
      2018-02-10T06:21:50.488-0500 I REPL     [ReplicationExecutor] This node is mongo1.nyc3.wegohealth.com:27017 in the config
      2018-02-10T06:21:50.489-0500 I REPL     [ReplicationExecutor] Member mongo2.nyc3.wegohealth.com:27017 is now in state STARTUP
      2018-02-10T06:22:22.179-0500 I STORAGE  [conn1467] passes = 5000 in CappedRecordStoreV1::allocRecord: ns: local.oplog.rs, lenToAlloc: 2191240, maxCappedDocs: 9223372036854775807, nrecords: 29226101, datasize: 4444541292. Continuing to delete old records to make room.
      2018-02-10T06:22:22.184-0500 I STORAGE  [conn1467] passes = 10000 in CappedRecordStoreV1::allocRecord: ns: local.oplog.rs, lenToAlloc: 2191240, maxCappedDocs: 9223372036854775807, nrecords: 29221101, datasize: 4443817604. Continuing to delete old records to make room.
      

            Assignee:
            kelsey.schubert@mongodb.com Kelsey Schubert
            Reporter:
            alexs Alexander S.
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              Resolved: