-
Type: Bug
-
Resolution: Done
-
Priority: Major - P3
-
Affects Version/s: 2.8.0-rc4
-
Component/s: Replication
-
None
-
Fully Compatible
-
ALL
During shutdown, it is possible for the replication consumer threads to stop pulling items out of the BGSync::_buffer queue, while the produce thread (oplog tailer/bgsync thread) is blocked trying to insert an item into the same, fixed-sized queue.
For example, in 2.8.0-rc5-pre-, we can see the following two stacks in a hung system. Thread 3 is stuck because nobody is draining the BGSync::_buffer, and thread 2 is stuck because thread 3 never makes progress and so never checks for shutdown.
Thread 3 (Thread 0x7ed14c6f9700 (LWP 17201)): #0 0x0000003887c0b5bc in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x0000000000bf4690 in void boost::condition_variable_any::wait<boost::unique_lock<boost::timed_mutex> >(boost::unique_lock<boost::timed_mutex>&) () at src/third_party/boost/boost/thread/pthread/condition_variable.hpp:137 #2 0x0000000000bf82d3 in mongo::repl::BackgroundSync::produce(mongo::OperationContext*) () at src/mongo/util/queue.h:76 #3 0x0000000000bf981e in mongo::repl::BackgroundSync::_producerThread() () at src/mongo/db/repl/bgsync.cpp:193 ...
Thread 2 (Thread 0x7ed12c747700 (LWP 17397)): #0 0x0000003887c0b5bc in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x0000000000f9a8db in boost::thread::join() () at src/third_party/boost/boost/thread/pthread/condition_variable.hpp:56 #2 0x0000000000c564a5 in mongo::repl::ReplicationCoordinatorExternalStateImpl::shutdown() () at src/mongo/db/repl/replication_coordinator_external_state_impl.cpp:107 #3 0x0000000000c5b1f3 in mongo::repl::ReplicationCoordinatorImpl::shutdown() () at src/mongo/db/repl/replication_coordinator_impl.cpp:371 #4 0x0000000000aa429a in mongo::exitCleanly(mongo::ExitCode) () at src/mongo/db/instance.cpp:1101 #5 0x00000000009cf75a in mongo::CmdShutdown::shutdownHelper() () at src/mongo/db/dbcommands_generic.cpp:325 ...
- is duplicated by
-
SERVER-16396 Replication stall, then one secondary would not shut down (mmapv1)
- Closed