Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-91739

Mongo test driver connection is not resilient to primary failover

    • Type: Icon: Bug Bug
    • Resolution: Unresolved
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • None
    • Networking & Observability
    • ALL
    • Hide

      Run the attached repro test in "concurrency_sharded_kill_primary_with_balancer" suite

      Show
      Run the attached repro test in "concurrency_sharded_kill_primary_with_balancer" suite
    • 0

      The internal Mongo driver we use in our JS tests is not robust to primary failover.

      In particular

      new Mongo(replicaset_url)

      Will throw in case the primary of the replica is stepping/shutting down.

      Since we are passing a replica set URL the expectation is that the mongo driver is able to automatically switch to the new primary during initial connection.

      Error example

      Error: can't connect to new replica set primary [localhost:22005], err: Connection handshake failed. Is your mongod/mongos 3.4 or older? :: caused by :: network error
      while attempting to run command 'hello' on host 'localhost:22005'  :: caused by :: dbclient error communicating with server localhost:22005 :: caused by :: futurize :: caused by :: Connection reset by peer
      "mongo::getStackTrace()",
      "mongo::DBClientReplicaSet::checkPrimary()",
      "mongo::DBClientReplicaSet::runCommandWithTarget(mongo::OpMsgRequest)",
      "mongo::DBClientBase::runCommandWithTarget(mongo::DatabaseName const&, mongo::BSONObj, mongo::BSONObj&, int)",
      "mongo::DBClientBase::runCommand(mongo::DatabaseName const&, mongo::BSONObj, mongo::BSONObj&, int)",
      "mongo::shell_utils::ConnectionRegistry::registerConnection(mongo::DBClientBase&, mongo::StringData)",
      "mongo::mozjs::MongoExternalInfo::construct(JSContext*, JS::CallArgs)",
      "bool mongo::mozjs::smUtils::construct<mongo::mozjs::MongoExternalInfo>(JSContext*, unsigned int, JS::Value*)",
      "InternalConstruct(JSContext*, js::AnyConstructArgs const&, js::CallReason)",
      "js::Interpret(JSContext*, js::RunState&)",
      "js::RunScript(JSContext*, js::RunState&)",
      "js::InternalCallOrConstruct(JSContext*, JS::CallArgs const&, js::MaybeConstruct, js::CallReason)",
      "js::Call(JSContext*, JS::Handle<JS::Value>, JS::Handle<JS::Value>, js::AnyInvokeArgs const&, JS::MutableHandle<JS::Value>, js::CallReason)",
      "js::fun_call(JSContext*, unsigned int, JS::Value*)",
      "js::InternalCallOrConstruct(JSContext*, JS::CallArgs const&, js::MaybeConstruct, js::CallReason)",
      "js::Interpret(JSContext*, js::RunState&)",
      "js::RunScript(JSContext*, js::RunState&)",
      "js::InternalCallOrConstruct(JSContext*, JS::CallArgs const&, js::MaybeConstruct, js::CallReason)",
      "js::Call(JSContext*, JS::Handle<JS::Value>, JS::Handle<JS::Value>, js::AnyInvokeArgs const&, JS::MutableHandle<JS::Value>, js::CallReason)",
      "js::CallSelfHostedFunction(JSContext*, JS::Handle<js::PropertyName*>, JS::Handle<JS::Value>, js::AnyInvokeArgs const&, JS::MutableHandle<JS::Value>)",
      "AsyncFunctionResume(JSContext*, JS::Handle<js::AsyncFunctionGeneratorObject*>, ResumeKind, JS::Handle<JS::Value>)",
      "PromiseReactionJob(JSContext*, unsigned int, JS::Value*)",
      "js::InternalCallOrConstruct(JSContext*, JS::CallArgs const&, js::MaybeConstruct, js::CallReason)",
      "js::Call(JSContext*, JS::Handle<JS::Value>, JS::Handle<JS::Value>, js::AnyInvokeArgs const&, JS::MutableHandle<JS::Value>, js::CallReason)",
      "JS::Call(JSContext*, JS::Handle<JS::Value>, JS::Handle<JS::Value>, JS::HandleValueArray const&, JS::MutableHandle<JS::Value>)",
      "js::InternalJobQueue::runJobs(JSContext*)",
      "js::RunJobs(JSContext*)",
      "mongo::mozjs::MozJSImplScope::callThreadArgs(mongo::BSONObj const&)",
      "mongo::mozjs::JSThreadConfig::JSThread::run(void*)",
      "mongo::stdx::thread::thread<void (*)(void*), mongo::mozjs::JSThreadConfig::JSThread*, 0>(void (*)(void*), mongo::mozjs::JSThreadConfig::JSThread*&&)::...",

      From the stack trace it looks like the problem is that as part of initial connection establishment we attempt to run a "whatsmyuri" admin command . This command is not being retried by the DBClientReplicaSet::runCommandWithTarget.

            Assignee:
            Unassigned Unassigned
            Reporter:
            tommaso.tocci@mongodb.com Tommaso Tocci
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated: