-
Type: Bug
-
Resolution: Community Answered
-
Priority: Major - P3
-
None
-
Affects Version/s: 4.0.9, 4.0.19
-
Component/s: None
-
None
-
ALL
Recently, We encountered a strange phenomenon
some 4.0 mongodb sharding cluster , The replication of secondary hang up. So the lag between primary and secondary have growing so large.
I have colloect the pstack data of mongod.
we can know that 16 replWriterThread is waiting for tasks, meaning they are idle。
```
#0 futex_wait_cancelable (private=0, expected=0, futex_word=0x5580fc7dd458) at ../sysdeps/unix/sysv/linux/futex-internal.h:88#1 _pthread_cond_wait_common (abstime=0x0, mutex=0x5580fc7dd400, cond=0x5580fc7dd430) at pthread_cond_wait.c:502#2 __pthread_cond_wait (cond=0x5580fc7dd430, mutex=0x5580fc7dd400) at pthread_cond_wait.c:655#3 0x00005580f5f7ceec in std::condition_variable::wait(std::unique_lock<std::mutex>&) ()#4 0x00005580f5632750 in mongo::ThreadPool::_consumeTasks() ()#5 0x00005580f5632e86 in mongo::ThreadPool::_workerThreadBody(mongo::ThreadPool*, std::_cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) ()#6 0x00005580f56331be in std::thread::_Impl<std::_Bind_simple<mongo::stdx::thread::thread<mongo::ThreadPool::_startWorkerThread_inlock()::{lambda()#1}, , 0>(mongo::ThreadPool::_startWorkerThread_inlock()::{lambda()#1})::{lambda()#1} ()> >::_M_run() ()#7 0x00005580f5f7ff60 in execute_native_thread_routine ()#8 0x00007fd5151a2fa3 in start_thread (arg=<optimized out>) at pthread_create.c:486#9 0x00007fd5150d14cf in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
```
but the batcher thread is waitForIdle for repl thread.
```
#0 futex_wait_cancelable (private=0, expected=0, futex_word=0x5580fc7dd48c) at ../sysdeps/unix/sysv/linux/futex-internal.h:88#1 __pthread_cond_wait_common (abstime=0x0, mutex=0x5580fc7dd400, cond=0x5580fc7dd460) at pthread_cond_wait.c:502#2 __pthread_cond_wait (cond=0x5580fc7dd460, mutex=0x5580fc7dd400) at pthread_cond_wait.c:655#3 0x00005580f5f7ceec in std::condition_variable::wait(std::unique_lock<std::mutex>&) ()#4 0x00005580f56311bb in mongo::ThreadPool::waitForIdle() ()#5 0x00005580f4816d91 in mongo::repl::SyncTail::multiApply(mongo::OperationContext*, std::vector<mongo::repl::OplogEntry, std::allocator<mongo::repl::OplogEntry> >) ()#6 0x00005580f48186e3 in mongo::repl::SyncTail::_oplogApplication(mongo::repl::OplogBuffer*, mongo::repl::ReplicationCoordinator*, mongo::repl::SyncTail::OpQueueBatcher*) ()#7 0x00005580f48198c3 in mongo::repl::SyncTail::oplogApplication(mongo::repl::OplogBuffer*, mongo::repl::ReplicationCoordinator*) ()
```
so i guess there is a bug here, but i don't find what's the root cause of the bug.
- related to
-
SERVER-56054 Change minThreads value for replication writer thread pool to 0
- Closed