-
Type: Bug
-
Resolution: Done
-
Priority: Major - P3
-
Affects Version/s: None
-
Component/s: Diagnostics
-
None
-
Fully Compatible
-
ALL
-
QuInt C (11/23/15)
There is a race on ClientBasic::_messagingPort that can be reproduced by running the currentOp command with the "$all: true" option at the same time that a client connection is closed.
- When a client connection is closed, the associated server thread deletes the MessagingPort object that it owns, in handleIncomingMsg() (message_server_port.cpp). This deletion is performed outside of the lock on the Client object.
- When the currentOp command is run, it calls Client::getRemote() on each client in the global clients list. Client::getRemote() delegates its implementation to AbstractMessagingPort::remote(), on the AbstractMessagingPort object that the Client keeps a reference to.
- Client::getRemote() will cause a use-after-free if the MessagingPort associated with the client has already been deleted.
This is a regression introduced in 3.1.4 by SERVER-18482 (d6e68685). It affects development releases 3.1.4 through 3.2.0-rc2, and does not affect any production releases.
Reproduce with the following script:
startParallelShell("while (true) { db.currentOp(true); }"); while (true) { new Mongo(); gc(); }
See the following gdb session:
2015-11-10T17:05:08.579-0500 I NETWORK [conn16] end connection 127.0.0.1:35065 (2 connections now open) 2015-11-10T17:05:08.580-0500 I NETWORK [initandlisten] connection accepted from 127.0.0.1:35066 #17 (3 connections now open) Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x7fffb955d700 (LWP 18093)] 0x00007ffff4c792a5 in mongo::ClientBasic::getRemote (this=0x7fffb97c9730) at src/mongo/db/client_basic.h:65 65 return _messagingPort->remote(); (gdb) bt #0 0x00007ffff4c792a5 in mongo::ClientBasic::getRemote (this=0x7fffb97c9730) at src/mongo/db/client_basic.h:65 #1 0x00007fffedb0c79e in mongo::Client::reportState (this=0x7fffb97c9730, builder=...) at src/mongo/db/client.cpp:120 #2 0x00007fffee008d6e in mongo::CurrentOpCommand::run ( this=0x7fffee482a98 <mongo::currentOpCommand>, txn=0x7fffcdcd7600, db=..., cmdObj=..., options=0, errmsg=..., result=...) at src/mongo/db/commands/current_op.cpp:127 #3 0x00007fffee0a688a in mongo::Command::run (this=0x7fffee482a98 <mongo::currentOpCommand>, txn=0x7fffcdcd7600, request=..., replyBuilder=0x7fffb955b4e0) at src/mongo/db/dbcommands.cpp:1408 #4 0x00007fffee0a58ba in mongo::Command::execCommand (txn=0x7fffcdcd7600, command=0x7fffee482a98 <mongo::currentOpCommand>, request=..., replyBuilder=0x7fffb955b4e0) at src/mongo/db/dbcommands.cpp:1313 #5 0x00007ffff0ace770 in mongo::runCommands (txn=0x7fffcdcd7600, request=..., replyBuilder=0x7fffb955b4e0) at src/mongo/db/commands.cpp:498 #6 0x00007fffee0e8997 in mongo::(anonymous namespace)::receivedRpc (txn=0x7fffcdcd7600, client=..., dbResponse=..., message=...) at src/mongo/db/instance.cpp:293 #7 0x00007fffee0e5dfd in mongo::assembleResponse (txn=0x7fffcdcd7600, m=..., dbresponse=..., remote=...) at src/mongo/db/instance.cpp:522 #8 0x0000000000434013 in mongo::MyMessageHandler::process (this=0x7fffcdcdd2e0, m=..., port= 0x7fffcd3e5ea0) at src/mongo/db/db.cpp:170 #9 0x00007fffd668f6f5 in mongo::PortMessageServer::handleIncomingMsg (arg=0x7fffcd3e5ea0) at src/mongo/util/net/message_server_port.cpp:229 #10 0x00007fffcec39182 in start_thread (arg=0x7fffb955d700) at pthread_create.c:312 #11 0x00007fffce96630d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111 (gdb) info threads [New Thread 0x7fffb94dc700 (LWP 18099)] Id Target Id Frame 30 Thread 0x7fffb94dc700 (LWP 18099) "mongod" clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:81 29 Thread 0x7fffb95de700 (LWP 18097) "mongod" __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135 * 27 Thread 0x7fffb955d700 (LWP 18093) "mongod" 0x00007ffff4c792a5 in mongo::ClientBasic::getRemote (this=0x7fffb97c9730) at src/mongo/db/client_basic.h:65 14 Thread 0x7fffb965f700 (LWP 18027) "mongod" 0x00007fffcec407eb in __libc_recv (fd=13, buf=0x7fffb965e588, n=16, flags=-1) at ../sysdeps/unix/sysv/linux/x86_64/recv.c:33 13 Thread 0x7fffbc270700 (LWP 17986) "mongod" pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238 12 Thread 0x7fffbca71700 (LWP 17985) "mongod" 0x00007fffcec40b9d in nanosleep () at ../sysdeps/unix/syscall-template.S:81 11 Thread 0x7fffbd272700 (LWP 17984) "mongod" pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238 10 Thread 0x7fffbda73700 (LWP 17983) "mongod" 0x00007fffcec40b9d in nanosleep () at ../sysdeps/unix/syscall-template.S:81 9 Thread 0x7fffbe274700 (LWP 17982) "mongod" 0x00007fffcec40b9d in nanosleep () at ../sysdeps/unix/syscall-template.S:81 8 Thread 0x7fffbea75700 (LWP 17981) "mongod" pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238 7 Thread 0x7fffc9276700 (LWP 17980) "mongod" pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185 6 Thread 0x7fffc9a77700 (LWP 17979) "mongod" pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185 5 Thread 0x7fffca67d700 (LWP 17978) "mongod" pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238 4 Thread 0x7fffcae7e700 (LWP 17976) "mongod" 0x00007fffcec40b9d in nanosleep () at ../sysdeps/unix/syscall-template.S:81 3 Thread 0x7fffcb67f700 (LWP 17975) "mongod" pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185 2 Thread 0x7fffcd287700 (LWP 17974) "mongod" 0x00007fffcec410d1 in do_sigwait ( sig=0x7fffcd2868cc, set=<optimized out>) at ../nptl/sysdeps/unix/sysv/linux/../../../../../sysdeps/unix/sysv/linux/sigwait.c:60 1 Thread 0x7ffff7f17d00 (LWP 17967) "mongod" clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:81 (gdb) thread 29 [Switching to thread 29 (Thread 0x7fffb95de700 (LWP 18097))] #0 __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135 135 ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S: No such file or directory. (gdb) bt #0 __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135 #1 0x00007fffcec3b657 in _L_lock_909 () from /lib/x86_64-linux-gnu/libpthread.so.0 #2 0x00007fffcec3b480 in __GI___pthread_mutex_lock (mutex=0x7ffff7e6a538) at ../nptl/pthread_mutex_lock.c:79 #3 0x00007ffff72a7085 in __gthread_mutex_lock (__mutex=0x7ffff7e6a538) at /usr/bin/../lib/gcc/x86_64-linux-gnu/4.8/../../../../include/x86_64-linux-gnu/c++/4.8/bits/gthr-default.h:748 #4 0x00007ffff72a86c5 in std::mutex::lock (this=0x7ffff7e6a538) at /usr/bin/../lib/gcc/x86_64-linux-gnu/4.8/../../../../include/c++/4.8/mutex:134 #5 0x00007ffff72a7623 in std::lock_guard<std::mutex>::lock_guard (this=0x7fffb95dd810, __m=...) at /usr/bin/../lib/gcc/x86_64-linux-gnu/4.8/../../../../include/c++/4.8/mutex:414 #6 0x00007fffedb11109 in mongo::ServiceContext::ClientDeleter::operator() (this=0x7fffcdcdb3e0, client=0x7fffb97c9730) at src/mongo/db/service_context.cpp:165 #7 0x00007fffedb0d016 in std::unique_ptr<mongo::Client, mongo::ServiceContext::ClientDeleter>::~unique_ptr (this=0x7fffcdcdb3e0) at /usr/bin/../lib/gcc/x86_64-linux-gnu/4.8/../../../../include/c++/4.8/bits/unique_ptr.h:184 #8 0x00007fffedb0f9ee in boost::thread_specific_ptr<std::unique_ptr<mongo::Client, mongo::ServiceContext::ClientDeleter> >::delete_data::operator() (this=0x7ffff7e50d60, data=0x7fffcdcdb3e0) at src/third_party/boost-1.56.0/boost/thread/tss.hpp:42 #9 0x00007fffd4b70cd7 in boost::detail::(anonymous namespace)::tls_destructor (data=0x7fffcd3e03e0) at src/third_party/boost-1.56.0/libs/thread/src/pthread/thread.cpp:110 #10 0x00007fffcec38f82 in __nptl_deallocate_tsd () at pthread_create.c:158 #11 0x00007fffcec39195 in start_thread (arg=0x7fffb95de700) at pthread_create.c:325 #12 0x00007fffce96630d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111 (gdb)
And, see relevant output from valgrind --tool=helgrind:
==20184== Lock at 0x2E1CF148 was first observed ==20184== at 0x4C32145: pthread_mutex_lock (in /usr/lib/valgrind/vgpreload_helgrind-amd64-linux.so) ==20184== by 0x574F044: __gthread_mutex_lock(pthread_mutex_t*) (gthr-default.h:748) ==20184== by 0x5750684: std::mutex::lock() (mutex:134) ==20184== by 0x574F5E2: std::lock_guard<std::mutex>::lock_guard(std::mutex&) (mutex:414) ==20184== by 0xEBC30A5: mongo::ServiceContextMongoD::registerKillOpListener(mongo::KillOpListenerInterface*) (service_context_d.cpp:287) ==20184== by 0x42464E: _mongoInitializerFunction_CreateReplicationManager(mongo::InitializerContext*) (db.cpp:797) ==20184== by 0x429783: std::_Function_handler<mongo::Status (mongo::InitializerContext*), mongo::Status (*)(mongo::InitializerContext*)>::_M_invoke(std::_Any_data const&, mongo::InitializerContext*) (functional:2056) ==20184== by 0x1D6E6789: std::function<mongo::Status (mongo::InitializerContext*)>::operator()(mongo::InitializerContext*) const (functional:2464) ==20184== by 0x1D6E5D6F: mongo::Initializer::execute(std::vector<std::string, std::allocator<std::string> > const&, std::map<std::string, std::string, std::less<std::string>, std::allocator<std::pair<std::string const, std::string> > > const&) const (initializer.cpp:57) ==20184== by 0x1D6E5FA3: mongo::runGlobalInitializers(std::vector<std::string, std::allocator<std::string> > const&, std::map<std::string, std::string, std::less<std::string>, std::allocator<std::pair<std::string const, std::string> > > const&) (initializer.cpp:70) ==20184== by 0x1D6E6345: mongo::runGlobalInitializers(int, char const* const*, char const* const*) (initializer.cpp:89) ==20184== by 0x424197: mongoDbMain(int, char**, char**) (db.cpp:850) ==20184== ==20184== Lock at 0x35545A88 was first observed ==20184== at 0x4C2F795: ??? (in /usr/lib/valgrind/vgpreload_helgrind-amd64-linux.so) ==20184== by 0x238619D9: mongo::SpinLock::SpinLock() (spin_lock.cpp:56) ==20184== by 0xEF1544D: mongo::Client::Client(std::string, mongo::ServiceContext*, mongo::AbstractMessagingPort*) (client.cpp:105) ==20184== by 0xEF19C2F: mongo::ServiceContext::makeClient(std::string, mongo::AbstractMessagingPort*) (service_context.cpp:122) ==20184== by 0xEF152FD: mongo::Client::initThread(char const*, mongo::ServiceContext*, mongo::AbstractMessagingPort*) (client.cpp:89) ==20184== by 0xEF150BC: mongo::Client::initThread(char const*, mongo::AbstractMessagingPort*) (client.cpp:69) ==20184== by 0x433E12: mongo::MyMessageHandler::connected(mongo::AbstractMessagingPort*) (db.cpp:157) ==20184== by 0x2616EFD6: mongo::PortMessageServer::handleIncomingMsg(void*) (message_server_port.cpp:212) ==20184== by 0x4C30E26: ??? (in /usr/lib/valgrind/vgpreload_helgrind-amd64-linux.so) ==20184== by 0x2D939181: start_thread (pthread_create.c:312) ==20184== by 0x2DC4A30C: clone (clone.S:111) ==20184== ==20184== Possible data race during write of size 8 at 0x31C9BA40 by thread #99 ==20184== Locks held: none ==20184== at 0x265C6F3A: mongo::MessagingPort::~MessagingPort() (message_port.cpp:129) ==20184== by 0x2616E0C4: mongo::(anonymous namespace)::MessagingPortWithHandler::~MessagingPortWithHandler() (message_server_port.cpp:73) ==20184== by 0x2616E0E8: mongo::(anonymous namespace)::MessagingPortWithHandler::~MessagingPortWithHandler() (message_server_port.cpp:73) ==20184== by 0x2616DCF1: std::default_delete<mongo::(anonymous namespace)::MessagingPortWithHandler>::operator()(mongo::(anonymous namespace)::MessagingPortWithHandler*) const (unique_ptr.h:67) ==20184== by 0x2616DC55: std::unique_ptr<mongo::(anonymous namespace)::MessagingPortWithHandler, std::default_delete<mongo::(anonymous namespace)::MessagingPortWithHandler> >::~unique_ptr() (unique_ptr.h:184) ==20184== by 0x2616F39F: mongo::PortMessageServer::handleIncomingMsg(void*) (message_server_port.cpp:261) ==20184== by 0x4C30E26: ??? (in /usr/lib/valgrind/vgpreload_helgrind-amd64-linux.so) ==20184== by 0x2D939181: start_thread (pthread_create.c:312) ==20184== by 0x2DC4A30C: clone (clone.S:111) ==20184== ==20184== This conflicts with a previous read of size 8 by thread #97 ==20184== Locks held: 2, at addresses 0x2E1CF148 0x35545A88 ==20184== at 0x7DA025B: mongo::ClientBasic::getRemote() const (client_basic.h:65) ==20184== by 0xEF1575D: mongo::Client::reportState(mongo::BSONObjBuilder&) (client.cpp:120) ==20184== by 0xEA6CD2D: mongo::CurrentOpCommand::run(mongo::OperationContext*, std::string const&, mongo::BSONObj&, int, std::string&, mongo::BSONObjBuilder&) (current_op.cpp:127) ==20184== by 0xEB0A849: mongo::Command::run(mongo::OperationContext*, mongo::rpc::RequestInterface const&, mongo::rpc::ReplyBuilderInterface*) (dbcommands.cpp:1408) ==20184== by 0xEB09879: mongo::Command::execCommand(mongo::OperationContext*, mongo::Command*, mongo::rpc::RequestInterface const&, mongo::rpc::ReplyBuilderInterface*) (dbcommands.cpp:1313) ==20184== by 0xBF5072F: mongo::runCommands(mongo::OperationContext*, mongo::rpc::RequestInterface const&, mongo::rpc::ReplyBuilderInterface*) (commands.cpp:498) ==20184== by 0xEB4C956: mongo::(anonymous namespace)::receivedRpc(mongo::OperationContext*, mongo::Client&, mongo::DbResponse&, mongo::Message&) (instance.cpp:293) ==20184== by 0xEB49DBC: mongo::assembleResponse(mongo::OperationContext*, mongo::Message&, mongo::DbResponse&, mongo::HostAndPort const&) (instance.cpp:522) ==20184== ==20184== Address 0x31C9BA40 is 0 bytes inside a block of size 72 alloc'd ==20184== at 0x4C2C460: operator new(unsigned long) (in /usr/lib/valgrind/vgpreload_helgrind-amd64-linux.so) ==20184== by 0x2616E529: mongo::PortMessageServer::accepted(std::shared_ptr<mongo::Socket>, long long) (message_server_port.cpp:109) ==20184== by 0x2616EBEE: non-virtual thunk to mongo::PortMessageServer::accepted(std::shared_ptr<mongo::Socket>, long long) (message_server_port.cpp:165) ==20184== by 0x265BFCFA: mongo::Listener::initAndListen() (listen.cpp:352) ==20184== by 0x2616E44B: mongo::PortMessageServer::run() (message_server_port.cpp:176) ==20184== by 0x423EC2: mongo::_initAndListen(int) (db.cpp:635) ==20184== by 0x421B02: mongo::initAndListen(int) (db.cpp:640) ==20184== by 0x424385: mongoDbMain(int, char**, char**) (db.cpp:874) ==20184== by 0x424029: main (db.cpp:685)
- related to
-
SERVER-18482 db.currentOp(True) does not return client information on idle connections
- Closed