Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-19692

Mongod failed to open connection, remained in hung state, when running WT with LSM

    • Storage Execution
    • ALL

      The powercycle test was applied to WiredTiger with LSM. After several loops of start/crash/start, the connection was not made available, with mongod still active.

      Attached gdb session has the following backtrace for all threads:

      (gdb) thread apply all bt
      
      Thread 24 (Thread 0x7f2b4c35b700 (LWP 4575)):
      #0  0x00007f2b4c9fa0d1 in do_sigwait (sig=0x7f2b4c35a8fc, set=<optimized out>)
          at ../nptl/sysdeps/unix/sysv/linux/../../../../../sysdeps/unix/sysv/linux/sigwait.c:60
      #1  __sigwait (set=0x207ef20 <mongo::(anonymous namespace)::asyncSignals>, sig=0x7f2b4c35a8fc)
          at ../nptl/sysdeps/unix/sysv/linux/../../../../../sysdeps/unix/sysv/linux/sigwait.c:97
      #2  0x00000000011844e6 in mongo::(anonymous namespace)::signalProcessingThread() ()
      #3  0x00000000018be510 in execute_native_thread_routine ()
      #4  0x00007f2b4c9f2182 in start_thread (arg=0x7f2b4c35b700) at pthread_create.c:312
      #5  0x00007f2b4c71f47d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
      
      Thread 23 (Thread 0x7f2b4bb5a700 (LWP 4576)):
      #0  pthread_cond_timedwait@@GLIBC_2.3.2 ()
          at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
      #1  0x000000000181d3e6 in __wt_cond_wait ()
      #2  0x0000000001801ec6 in __evict_server ()
      #3  0x00007f2b4c9f2182 in start_thread (arg=0x7f2b4bb5a700) at pthread_create.c:312
      #4  0x00007f2b4c71f47d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
      
      Thread 22 (Thread 0x7f2b4b359700 (LWP 4577)):
      #0  pthread_cond_timedwait@@GLIBC_2.3.2 ()
          at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
      #1  0x000000000181d3e6 in __wt_cond_wait ()
      #2  0x00000000017e5eb4 in __sweep_server ()
      #3  0x00007f2b4c9f2182 in start_thread (arg=0x7f2b4b359700) at pthread_create.c:312
      #4  0x00007f2b4c71f47d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
      
      Thread 21 (Thread 0x7f2b4ab58700 (LWP 4578)):
      #0  pthread_cond_timedwait@@GLIBC_2.3.2 ()
          at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
      #1  0x000000000181d3e6 in __wt_cond_wait ()
      #2  0x00000000017e24e9 in __log_file_server ()
      #3  0x00007f2b4c9f2182 in start_thread (arg=0x7f2b4ab58700) at pthread_create.c:312
      #4  0x00007f2b4c71f47d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
      
      Thread 20 (Thread 0x7f2b4a357700 (LWP 4579)):
      #0  pthread_cond_timedwait@@GLIBC_2.3.2 ()
          at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
      #1  0x000000000181d3e6 in __wt_cond_wait ()
      #2  0x00000000017e3204 in __log_wrlsn_server ()
      #3  0x00007f2b4c9f2182 in start_thread (arg=0x7f2b4a357700) at pthread_create.c:312
      #4  0x00007f2b4c71f47d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
      ---Type <return> to continue, or q <return> to quit---
      
      Thread 19 (Thread 0x7f2b49b56700 (LWP 4580)):
      #0  pthread_cond_timedwait@@GLIBC_2.3.2 ()
          at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
      #1  0x000000000181d3e6 in __wt_cond_wait ()
      #2  0x00000000017e29b0 in __log_server ()
      #3  0x00007f2b4c9f2182 in start_thread (arg=0x7f2b49b56700) at pthread_create.c:312
      #4  0x00007f2b4c71f47d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
      
      Thread 18 (Thread 0x7f2b49355700 (LWP 4581)):
      #0  pthread_cond_timedwait@@GLIBC_2.3.2 ()
          at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
      #1  0x000000000181d3e6 in __wt_cond_wait ()
      #2  0x00000000017e0476 in __ckpt_server ()
      #3  0x00007f2b4c9f2182 in start_thread (arg=0x7f2b49355700) at pthread_create.c:312
      #4  0x00007f2b4c71f47d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
      
      Thread 17 (Thread 0x7f2b48b54700 (LWP 4582)):
      #0  0x00007f2b4c716da3 in select () at ../sysdeps/unix/syscall-template.S:81
      #1  0x000000000181e932 in __wt_sleep ()
      #2  0x0000000001810451 in __lsm_worker_manager ()
      #3  0x00007f2b4c9f2182 in start_thread (arg=0x7f2b48b54700) at pthread_create.c:312
      #4  0x00007f2b4c71f47d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
      
      Thread 16 (Thread 0x7f2b48353700 (LWP 4583)):
      #0  pthread_cond_timedwait@@GLIBC_2.3.2 ()
          at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
      #1  0x000000000181d3e6 in __wt_cond_wait ()
      #2  0x0000000001817afb in __lsm_worker ()
      #3  0x00007f2b4c9f2182 in start_thread (arg=0x7f2b48353700) at pthread_create.c:312
      #4  0x00007f2b4c71f47d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
      
      Thread 15 (Thread 0x7f2b47b52700 (LWP 4584)):
      #0  pthread_cond_timedwait@@GLIBC_2.3.2 ()
          at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
      #1  0x000000000181d3e6 in __wt_cond_wait ()
      #2  0x0000000001817afb in __lsm_worker ()
      #3  0x00007f2b4c9f2182 in start_thread (arg=0x7f2b47b52700) at pthread_create.c:312
      #4  0x00007f2b4c71f47d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
      
      Thread 14 (Thread 0x7f2b47351700 (LWP 4585)):
      #0  pthread_cond_timedwait@@GLIBC_2.3.2 ()
      ---Type <return> to continue, or q <return> to quit---
          at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
      #1  0x000000000181d3e6 in __wt_cond_wait ()
      #2  0x0000000001817afb in __lsm_worker ()
      #3  0x00007f2b4c9f2182 in start_thread (arg=0x7f2b47351700) at pthread_create.c:312
      #4  0x00007f2b4c71f47d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
      
      Thread 13 (Thread 0x7f2b46b50700 (LWP 4586)):
      #0  pthread_cond_timedwait@@GLIBC_2.3.2 ()
          at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
      #1  0x0000000000ab6dd9 in mongo::CondVarLockGrantNotification::wait(unsigned int) ()
      #2  0x0000000000aba7cf in mongo::LockerImpl<false>::lockComplete(mongo::ResourceId, mongo::LockMode, unsigned int, bool) ()
      #3  0x0000000000ab0d44 in mongo::Lock::GlobalLock::_lock(mongo::LockMode, unsigned int) ()
      #4  0x0000000000ab0d88 in mongo::Lock::GlobalLock::GlobalLock(mongo::Locker*, mongo::LockMode, unsigned int) ()
      #5  0x0000000000ab0e06 in mongo::Lock::DBLock::DBLock(mongo::Locker*, mongo::StringData, mongo::LockMode) ()
      #6  0x0000000000ac5e40 in mongo::AutoGetDb::AutoGetDb(mongo::OperationContext*, mongo::StringData, mongo::LockMode) ()
      #7  0x0000000000f3e958 in mongo::(anonymous namespace)::WiredTigerRecordStoreThread::run() ()
      #8  0x0000000001124c47 in mongo::BackgroundJob::jobBody() ()
      #9  0x00000000018be510 in execute_native_thread_routine ()
      #10 0x00007f2b4c9f2182 in start_thread (arg=0x7f2b46b50700) at pthread_create.c:312
      #11 0x00007f2b4c71f47d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
      
      Thread 12 (Thread 0x7f2b30c04700 (LWP 4587)):
      #0  pthread_cond_wait@@GLIBC_2.3.2 ()
          at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
      #1  0x00000000018bdd6c in std::condition_variable::wait(std::unique_lock<std::mutex>&) ()
      #2  0x00000000010f8218 in mongo::DeadlineMonitor<mongo::mozjs::MozJSImplScope>::deadlineMonitorThread() ()
      #3  0x00000000018be510 in execute_native_thread_routine ()
      #4  0x00007f2b4c9f2182 in start_thread (arg=0x7f2b30c04700) at pthread_create.c:312
      #5  0x00007f2b4c71f47d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
      
      Thread 11 (Thread 0x7f2b30403700 (LWP 4588)):
      #0  pthread_cond_timedwait@@GLIBC_2.3.2 ()
          at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
      #1  0x0000000000d194b3 in mongo::RangeDeleter::doWork() ()
      #2  0x00000000018be510 in execute_native_thread_routine ()
      #3  0x00007f2b4c9f2182 in start_thread (arg=0x7f2b30403700) at pthread_create.c:312
      #4  0x00007f2b4c71f47d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
      ---Type <return> to continue, or q <return> to quit---
      
      Thread 10 (Thread 0x7f2b2fc02700 (LWP 4589)):
      #0  pthread_cond_wait@@GLIBC_2.3.2 ()
          at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
      #1  0x00000000018bdd6c in std::condition_variable::wait(std::unique_lock<std::mutex>&) ()
      #2  0x0000000001139d0b in mongo::Listener::waitUntilListening() const ()
      #3  0x0000000000d570e8 in mongo::repl::isSelf(mongo::HostAndPort const&) ()
      #4  0x0000000000da8946 in mongo::repl::(anonymous namespace)::findSelfInConfig(mongo::repl::ReplicationCoordinatorExternalState*, mongo::repl::ReplicaSetConfig const&) ()
      #5  0x0000000000da965e in mongo::repl::validateConfigForStartUp(mongo::repl::ReplicationCoordinatorExternalState*, mongo::repl::ReplicaSetConfig const&, mongo::repl::ReplicaSetConfig const&) ()
      #6  0x0000000000dc6b08 in mongo::repl::ReplicationCoordinatorImpl::_finishLoadLocalConfig(mongo::executor::TaskExecutor::CallbackArgs const&, mongo::repl::ReplicaSetConfig const&, mongo::StatusWith<mongo::repl::OpTime> const&) ()
      #7  0x0000000000dd80b9 in mongo::repl::(anonymous namespace)::callNoExcept(std::function<void ()> const&) ()
      #8  0x0000000000ddd230 in mongo::repl::ReplicationExecutor::run() ()
      #9  0x00000000018be510 in execute_native_thread_routine ()
      #10 0x00007f2b4c9f2182 in start_thread (arg=0x7f2b2fc02700) at pthread_create.c:312
      #11 0x00007f2b4c71f47d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
      
      Thread 9 (Thread 0x7f2b2f401700 (LWP 4590)):
      #0  pthread_cond_wait@@GLIBC_2.3.2 ()
          at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
      #1  0x00000000018bdd6c in std::condition_variable::wait(std::unique_lock<std::mutex>&) ()
      #2  0x000000000112bf3e in mongo::ThreadPool::_consumeTasks() ()
      #3  0x000000000112c700 in mongo::ThreadPool::_workerThreadBody(mongo::ThreadPool*, std::string const&) ()
      #4  0x00000000018be510 in execute_native_thread_routine ()
      #5  0x00007f2b4c9f2182 in start_thread (arg=0x7f2b2f401700) at pthread_create.c:312
      #6  0x00007f2b4c71f47d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
      
      Thread 8 (Thread 0x7f2b2ec00700 (LWP 4591)):
      #0  pthread_cond_wait@@GLIBC_2.3.2 ()
          at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
      #1  0x00000000018bdd6c in std::condition_variable::wait(std::unique_lock<std::mutex>&) ()
      #2  0x0000000000f693d3 in mongo::executor::NetworkInterfaceImpl::_processAlarms() ()
      #3  0x000000000112af90 in mongo::ThreadPool::_doOneTask(std::unique_lock<std::mutex>*) ()
      #4  0x000000000112bb79 in mongo::ThreadPool::_consumeTasks() ()
      #5  0x000000000112c700 in mongo::ThreadPool::_workerThreadBody(mongo::ThreadPool*, std::string const&) ()
      #6  0x00000000018be510 in execute_native_thread_routine ()
      ---Type <return> to continue, or q <return> to quit---
      #7  0x00007f2b4c9f2182 in start_thread (arg=0x7f2b2ec00700) at pthread_create.c:312
      #8  0x00007f2b4c71f47d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
      
      Thread 7 (Thread 0x7f2b2e3ff700 (LWP 4592)):
      #0  pthread_cond_wait@@GLIBC_2.3.2 ()
          at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
      #1  0x00000000018bdd6c in std::condition_variable::wait(std::unique_lock<std::mutex>&) ()
      #2  0x000000000112bf3e in mongo::ThreadPool::_consumeTasks() ()
      #3  0x000000000112c700 in mongo::ThreadPool::_workerThreadBody(mongo::ThreadPool*, std::string const&) ()
      #4  0x00000000018be510 in execute_native_thread_routine ()
      #5  0x00007f2b4c9f2182 in start_thread (arg=0x7f2b2e3ff700) at pthread_create.c:312
      #6  0x00007f2b4c71f47d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
      
      Thread 6 (Thread 0x7f2b2dbfe700 (LWP 4593)):
      #0  pthread_cond_wait@@GLIBC_2.3.2 ()
          at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
      #1  0x00000000018bdd6c in std::condition_variable::wait(std::unique_lock<std::mutex>&) ()
      #2  0x000000000112bf3e in mongo::ThreadPool::_consumeTasks() ()
      #3  0x000000000112c700 in mongo::ThreadPool::_workerThreadBody(mongo::ThreadPool*, std::string const&) ()
      #4  0x00000000018be510 in execute_native_thread_routine ()
      #5  0x00007f2b4c9f2182 in start_thread (arg=0x7f2b2dbfe700) at pthread_create.c:312
      #6  0x00007f2b4c71f47d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
      
      Thread 5 (Thread 0x7f2b2d3fd700 (LWP 4594)):
      #0  pthread_cond_wait@@GLIBC_2.3.2 ()
          at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
      #1  0x00000000018bdd6c in std::condition_variable::wait(std::unique_lock<std::mutex>&) ()
      #2  0x000000000112bf3e in mongo::ThreadPool::_consumeTasks() ()
      #3  0x000000000112c700 in mongo::ThreadPool::_workerThreadBody(mongo::ThreadPool*, std::string const&) ()
      #4  0x00000000018be510 in execute_native_thread_routine ()
      #5  0x00007f2b4c9f2182 in start_thread (arg=0x7f2b2d3fd700) at pthread_create.c:312
      #6  0x00007f2b4c71f47d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
      
      Thread 4 (Thread 0x7f2b2cbfc700 (LWP 4595)):
      #0  0x00007f2b4c9f9b9d in nanosleep () at ../sysdeps/unix/syscall-template.S:81
      #1  0x000000000118ff15 in mongo::sleepsecs(int) ()
      #2  0x0000000000f52a5b in mongo::TTLMonitor::run() ()
      #3  0x0000000001124c47 in mongo::BackgroundJob::jobBody() ()
      #4  0x00000000018be510 in execute_native_thread_routine ()
      ---Type <return> to continue, or q <return> to quit---
      #5  0x00007f2b4c9f2182 in start_thread (arg=0x7f2b2cbfc700) at pthread_create.c:312
      #6  0x00007f2b4c71f47d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
      
      Thread 3 (Thread 0x7f2b2c3fb700 (LWP 4596)):
      #0  pthread_cond_timedwait@@GLIBC_2.3.2 ()
          at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
      #1  0x0000000000ab6dd9 in mongo::CondVarLockGrantNotification::wait(unsigned int) ()
      #2  0x0000000000aba7cf in mongo::LockerImpl<false>::lockComplete(mongo::ResourceId, mongo::LockMode, unsigned int, bool) ()
      #3  0x0000000000ab0d44 in mongo::Lock::GlobalLock::_lock(mongo::LockMode, unsigned int) ()
      #4  0x0000000000ab0d88 in mongo::Lock::GlobalLock::GlobalLock(mongo::Locker*, mongo::LockMode, unsigned int) ()
      #5  0x0000000000ab0e06 in mongo::Lock::DBLock::DBLock(mongo::Locker*, mongo::StringData, mongo::LockMode) ()
      #6  0x0000000000ac5e40 in mongo::AutoGetDb::AutoGetDb(mongo::OperationContext*, mongo::StringData, mongo::LockMode) ()
      #7  0x0000000000ac619e in mongo::AutoGetCollectionForRead::AutoGetCollectionForRead(mongo::OperationContext*, std::string const&) ()
      #8  0x00000000009f23d8 in mongo::GlobalCursorIdCache::timeoutCursors(mongo::OperationContext*, int) ()
      #9  0x0000000000a1072e in mongo::ClientCursorMonitor::run() ()
      #10 0x0000000001124c47 in mongo::BackgroundJob::jobBody() ()
      #11 0x00000000018be510 in execute_native_thread_routine ()
      #12 0x00007f2b4c9f2182 in start_thread (arg=0x7f2b2c3fb700) at pthread_create.c:312
      #13 0x00007f2b4c71f47d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
      
      Thread 2 (Thread 0x7f2b2bbfa700 (LWP 4597)):
      #0  pthread_cond_timedwait@@GLIBC_2.3.2 ()
          at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
      #1  0x00000000011256fe in mongo::(anonymous namespace)::PeriodicTaskRunner::run() ()
      #2  0x0000000001124c47 in mongo::BackgroundJob::jobBody() ()
      #3  0x00000000018be510 in execute_native_thread_routine ()
      #4  0x00007f2b4c9f2182 in start_thread (arg=0x7f2b2bbfa700) at pthread_create.c:312
      #5  0x00007f2b4c71f47d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
      
      Thread 1 (Thread 0x7f2b4da3ecc0 (LWP 4574)):
      #0  0x00007f2b4c716da3 in select () at ../sysdeps/unix/syscall-template.S:81
      #1  0x000000000181e932 in __wt_sleep ()
      #2  0x000000000180b262 in __wt_clsm_await_switch ()
      #3  0x000000000180b760 in __clsm_enter ()
      #4  0x000000000180d09a in __clsm_insert ()
      #5  0x0000000000f2d70c in mongo::WiredTigerIndexUnique::_insert(__wt_cursor*, mongo::BSONObj cons---Type <return> to continue, or q <return> to quit---
      t&, mongo::RecordId const&, bool) ()
      #6  0x0000000000f2de14 in mongo::WiredTigerIndex::insert(mongo::OperationContext*, mongo::BSONObj const&, mongo::RecordId const&, bool) ()
      #7  0x0000000000b94924 in mongo::IndexAccessMethod::insert(mongo::OperationContext*, mongo::BSONObj const&, mongo::RecordId const&, mongo::InsertDeleteOptions const&, long*) ()
      #8  0x00000000009ff74d in mongo::IndexCatalog::_indexRecord(mongo::OperationContext*, mongo::IndexCatalogEntry*, mongo::BSONObj const&, mongo::RecordId const&) ()
      #9  0x00000000009ffb46 in mongo::IndexCatalog::indexRecord(mongo::OperationContext*, mongo::BSONObj const&, mongo::RecordId const&) ()
      #10 0x00000000009e38ff in mongo::Collection::_insertDocument(mongo::OperationContext*, mongo::BSONObj const&, bool) ()
      #11 0x00000000009e53ab in mongo::Collection::insertDocument(mongo::OperationContext*, mongo::BSONObj const&, bool, bool) ()
      #12 0x00000000008b8b5e in mongo::logStartup() ()
      #13 0x00000000008baa56 in mongo::initAndListen(int) ()
      #14 0x00000000008be0f4 in main ()
      

        1. powertest.sh
          35 kB
        2. mongod-wiredTiger-recovery.log
          2.39 MB
        3. mongod-wiredTiger.log
          7.35 MB
        4. pttest.log
          20 kB
        5. wiredTiger.tar.1
          50.00 MB
        6. wiredTiger.tar.2
          50.00 MB
        7. wiredTiger.tar.3
          50.00 MB
        8. wiredTiger.tar.4
          50.00 MB
        9. wiredTiger.tar.5
          39.50 MB

            Assignee:
            backlog-server-execution [DO NOT USE] Backlog - Storage Execution Team
            Reporter:
            jonathan.abrahams Jonathan Abrahams
            Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

              Created:
              Updated:
              Resolved: