-
Type: Bug
-
Resolution: Duplicate
-
Priority: Minor - P4
-
None
-
Affects Version/s: 2.2.0
-
Component/s: Concurrency
-
None
-
Environment:CentOS release 5.8 (Final), 2.6.18-308.16.1.el5 #1 SMP Tue Oct 2 22:01:43 EDT 2012 x86_64 x86_64 x86_64 GNU/Linux
-
ALL
Installed the 2.2.0 rpm package from 10gen repo. 'service mongod start' creates 3 processes:
root 12603 9583 0 12:59 pts/0 00:00:00 /bin/sh /sbin/service mongod restart
root 12608 12603 0 12:59 pts/0 00:00:00 /bin/bash /etc/init.d/mongod restart
root 12622 12608 0 12:59 pts/0 00:00:00 runuser -s /bin/bash - mongod -c ulimit -S -c 0 >/dev/null 2>&1 ; numactl --interleave=all /usr/bin/mongod -f /etc/mongod.
mongod 12623 12622 0 12:59 ? 00:00:00 -bash -c ulimit -S -c 0 >/dev/null 2>&1 ; numactl --interleave=all /usr/bin/mongod -f /etc/mongod.conf
mongod 12645 12623 1 12:59 ? 00:00:00 /usr/bin/mongod -f /etc/mongod.conf
mongod 12647 12645 0 12:59 ? 00:00:00 /usr/bin/mongod -f /etc/mongod.conf
mongod 12648 12647 0 12:59 ? 00:00:00 /usr/bin/mongod -f /etc/mongod.conf
strace of PID 12648, the third - obviously hanging - process gives:
...
futex(0x13dc400, FUTEX_WAIT_PRIVATE, 2,
) = -1 ETIMEDOUT (Connection timed out)
futex(0x13dc400, FUTEX_WAIT_PRIVATE, 2,
) = -1 ETIMEDOUT (Connection timed out)
futex(0x13dc400, FUTEX_WAIT_PRIVATE, 2,
) = -1 ETIMEDOUT (Connection timed out)
...
gdb:
Thread 2 (Thread 0x40a87940 (LWP 12649)):
#0 0x00000000008d2996 in base::internal::SpinLockDelay(int volatile*, int, int) ()
#1 0x000000000086210c in SpinLock::SlowLock() ()
#2 0x0000000000866056 in tcmalloc::ThreadCache::CreateCacheIfNecessary() ()
#3 0x00000000009b0857 in ?? ()
#4 0x0000000000c22872 in tc_malloc ()
#5 0x00000000009e63aa in boost::detail::get_once_per_thread_epoch() ()
#6 0x00000000007c4ff8 in void boost::call_once<void ()>(boost::once_flag&, void ()) ()
#7 0x00000000007c1e57 in boost::detail::set_current_thread_data(boost::detail::thread_data_base*) ()
#8 0x00000000007c3646 in ?? ()
#9 0x0000003124e0677d in start_thread () from /lib64/libpthread.so.0
#10 0x00000031246d3c1d in clone () from /lib64/libc.so.6
Thread 1 (Thread 0x2b5613d478c0 (LWP 12648)):
#0 0x00000000008d2996 in base::internal::SpinLockDelay(int volatile*, int, int) ()
#1 0x000000000086210c in SpinLock::SlowLock() ()
#2 0x00000000008b92db in tcmalloc::CentralFreeList::Populate() ()
#3 0x00000000008b9498 in tcmalloc::CentralFreeList::FetchFromSpansSafe() ()
#4 0x00000000008b9534 in tcmalloc::CentralFreeList::RemoveRange(void*, void*, int) ()
#5 0x0000000000865c0d in tcmalloc::ThreadCache::FetchFromCentralCache(unsigned long, unsigned long) ()
#6 0x00000000009b0f2f in ?? ()
#7 0x0000000000c21c95 in tc_new ()
#8 0x0000000000599e14 in boost::detail::thread_data<boost::_bi::bind_t<void, boost::_mfi::mf1<void, mongo::BackgroundJob, boost::shared_ptr<mongo::BackgroundJob::JobStatus> >, boost::_bi::list2<boost::_bi::value<mongo::BackgroundJob*>, boost::_bi::value<boost::shared_ptr<mongo::BackgroundJob::JobStatus> > > > >* boost::detail::heap_new_impl<boost::detail::thread_data<boost::_bi::bind_t<void, boost::_mfi::mf1<void, mongo::BackgroundJob, boost::shared_ptr<mongo::BackgroundJob::JobStatus> >, boost::_bi::list2<boost::_bi::value<mongo::BackgroundJob*>, boost::_bi::value<boost::shared_ptr<mongo::BackgroundJob::JobStatus> > > > >, boost::_bi::bind_t<void, boost::_mfi::mf1<void, mongo::BackgroundJob, boost::shared_ptr<mongo::BackgroundJob::JobStatus> >, boost::_bi::list2<boost::_bi::value<mongo::BackgroundJob*>, boost::_bi::value<boost::shared_ptr<mongo::BackgroundJob::JobStatus> > > >&>(boost::_bi::bind_t<void, boost::_mfi::mf1<void, mongo::BackgroundJob, boost::shared_ptr<mongo::BackgroundJob::JobStatus> >, boost::_bi::list2<boost::_bi::value<mongo::BackgroundJob*>, boost::_bi::value<boost::shared_ptr<mongo::BackgroundJob::JobStatus> > > >&) ()
#9 0x00000000005954fa in mongo::BackgroundJob::go() ()
#10 0x00000000005630cc in ?? ()
#11 0x0000000000565399 in main ()
This behaviour is somewhat random, because sometimes the startup works.
Notes: I rebuilt mongod from source r2.2.0, stripped the binary manually and to my surprise this binary, does not show this behaviour. Alas, another binary installed with 'scons install' always hangs.
- duplicates
-
SERVER-7434 Startup race with --fork
- Closed
- is duplicated by
-
SERVER-8252 Startup hangs infinitely, DataFileSync background job cannot create new thread
- Closed