Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-10839

Race condition in rsStart on getting new version of config

    • Type: Icon: Bug Bug
    • Resolution: Done
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: 2.5.2
    • Component/s: Replication
    • ALL

      I removed a node from replica set, restarted mongod on a different port and added it back in with new host:port.

      When coming back into the set, I noticed in the logs:

      $ mongod --port 37020 --dbpath /data/rs3 --replSet asyaS17 --oplogSize 100
      2013-09-21T11:21:30.780-0700 [initandlisten] MongoDB starting : pid=44500 port=37020 dbpath=/data/rs3 64-bit host=asyasmacbook.local
      2013-09-21T11:21:30.780-0700 [initandlisten]
      2013-09-21T11:21:30.780-0700 [initandlisten] ** NOTE: This is a development version (2.5.2) of MongoDB.
      2013-09-21T11:21:30.780-0700 [initandlisten] **       Not recommended for production.
      2013-09-21T11:21:30.780-0700 [initandlisten]
      2013-09-21T11:21:30.780-0700 [initandlisten] ** WARNING: soft rlimits too low. Number of files is 256, should be at least 1000
      2013-09-21T11:21:30.780-0700 [initandlisten]
      2013-09-21T11:21:30.780-0700 [initandlisten] db version v2.5.2
      2013-09-21T11:21:30.780-0700 [initandlisten] git version: b283169ae51328f3fb5e3c8a3c9017a2e42c148f
      2013-09-21T11:21:30.780-0700 [initandlisten] build info: Darwin bs-osx-106-x86-64-2.10gen.cc 10.8.0 Darwin Kernel Version 10.8.0: Tue Jun  7 16:32:41 PDT 2011; root:xnu-1504.15.3~1/RELEASE_X86_64 x86_64 BOOST_LIB_VERSION=1_49
      2013-09-21T11:21:30.780-0700 [initandlisten] allocator: system
      2013-09-21T11:21:30.780-0700 [initandlisten] options: { dbpath: "/data/rs3", oplogSize: 100, port: 37020, replSet: "asyaS17" }
      2013-09-21T11:21:30.781-0700 [initandlisten] journal dir=/data/rs3/journal
      2013-09-21T11:21:30.783-0700 [initandlisten] recover : no journal files present, no recovery needed
      2013-09-21T11:21:30.824-0700 [initandlisten] waiting for connections on port 37020
      2013-09-21T11:21:30.826-0700 [IndexRebuilder] assertion 13435 not master and slaveOk=false ns:cs.system.namespaces query:{}
      2013-09-21T11:21:30.826-0700 [IndexRebuilder] warning: index rebuilding did not complete
      2013-09-21T11:21:30.826-0700 [rsStart] replSet REMOVED
      2013-09-21T11:21:30.826-0700 [rsStart] replSet info self not present in the repl set configuration:
      2013-09-21T11:21:30.826-0700 [rsStart] { _id: "asyaS17", version: 7, members: [ { _id: 0, host: "asyasmacbook.local:37017", priority: 2.0 }, { _id: 1, host: "asyasmacbook.local:37018" } ] }
      2013-09-21T11:21:30.826-0700 [rsStart] replSet info Couldn't load config yet. Sleeping 20sec and will try again.
      2013-09-21T11:21:50.827-0700 [rsStart] replSet info Couldn't load config yet. Sleeping 20sec and will try again.
      2013-09-21T11:21:50.899-0700 [initandlisten] connection accepted from 127.0.0.1:59868 #1 (1 connection now open)
      2013-09-21T11:21:51.960-0700 [initandlisten] connection accepted from 127.0.0.1:59870 #2 (2 connections now open)
      2013-09-21T11:22:09.971-0700 [conn2] end connection 127.0.0.1:59870 (1 connection now open)
      2013-09-21T11:22:09.971-0700 [initandlisten] connection accepted from 127.0.0.1:59876 #3 (2 connections now open)
      2013-09-21T11:22:10.827-0700 [rsStart] trying to contact asyasmacbook.local:37017
      2013-09-21T11:22:10.829-0700 [rsStart] trying to contact asyasmacbook.local:37018
      2013-09-21T11:22:10.830-0700 [rsStart] replSet I am asyasmacbook.local:37020
      2013-09-21T11:22:10.830-0700 [rsStart] replSet got config version 8 from a remote, saving locally
      2013-09-21T11:22:10.830-0700 [rsStart] replSet info saving a newer config version to local.system.replset
      2013-09-21T11:22:10.836-0700 [rsStart] replSet saveConfigLocally done
      2013-09-21T11:22:10.836-0700 [rsStart] replSet info Couldn't load config yet. Sleeping 20sec and will try again.
      2013-09-21T11:22:10.918-0700 [conn1] end connection 127.0.0.1:59868 (1 connection now open)
      2013-09-21T11:22:10.918-0700 [initandlisten] connection accepted from 127.0.0.1:59879 #4 (2 connections now open)
      2013-09-21T11:22:30.833-0700 [clientcursormon] mem (MB) res:36 virt:3088
      2013-09-21T11:22:30.833-0700 [clientcursormon]  mapped (incl journal view):576
      2013-09-21T11:22:30.833-0700 [clientcursormon]  connections:2
      2013-09-21T11:22:30.837-0700 [rsStart] trying to contact asyasmacbook.local:37017
      2013-09-21T11:22:30.838-0700 [rsStart] trying to contact asyasmacbook.local:37018
      2013-09-21T11:22:30.838-0700 [rsStart] replSet I am asyasmacbook.local:37020
      2013-09-21T11:22:30.839-0700 [rsStart] replSet STARTUP2
      2013-09-21T11:22:32.840-0700 [rsHealthPoll] replSet member asyasmacbook.local:37017 is up
      2013-09-21T11:22:32.840-0700 [rsHealthPoll] replSet member asyasmacbook.local:37017 is now in state PRIMARY
      2013-09-21T11:22:32.841-0700 [rsHealthPoll] replSet member asyasmacbook.local:37018 is up
      2013-09-21T11:22:32.841-0700 [rsHealthPoll] replSet member asyasmacbook.local:37018 is now in state SECONDARY
      

      It seemed like it got the config and applied it successfully and yet logged that it couldn't and slept for another 20 seconds.

      Looking at the code (I looked for 2.5.2 to make sure it matched my executables) I see this:

      https://github.com/mongodb/mongo/blob/r2.5.2/src/mongo/db/repl/rs.cpp#L793

      so false was returned from _loadConfigFinish and the only place I see that is https://github.com/mongodb/mongo/blob/r2.5.2/src/mongo/db/repl/rs.cpp#L701 and the messages at 11:22:10 clearly indicate that we're in saveConfigLocally https://github.com/mongodb/mongo/blob/r2.5.2/src/mongo/db/repl/rs.cpp#L705 after which only true is returned.

      Not sure if I'm misreading something, but it seems strange that the logs indicate first successful save of new version of config and then a warning that we "Couldn't load config yet".

            Assignee:
            spencer@mongodb.com Spencer Brody (Inactive)
            Reporter:
            asya.kamsky@mongodb.com Asya Kamsky
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

              Created:
              Updated:
              Resolved: