ISSUE SUMMARY
When mapReduce is run repeatedly on the same client connection the mongod will continue to keep track of any temporary collections used during the mapReduce. These temporary collection names will slowly build up in a cache on the mongod, appearing as a slow memory leak.
USER IMPACT
This impacts users of mapReduce on sharded collections and manifests as a slow increase in non-mapped virtual memory on mongod. It is present in versions of MongoDB prior to and including v2.4.6.
SOLUTION
After each mapReduce completes and the temporary collections are dropped, also explicitly remove the collection name from the cache used for keeping track of namespace versioning.
WORKAROUNDS
This issue can be worked around by stepping down the primary mongod of the shard.
PATCHES
Production release v2.4.7 contains the fix for this issue, and production release v2.6.0 will contain the fix as well.
Original Description
When running map-reduce continually with the test set provided we see 2 paths in which there is continual heap growth. The following is valgrind massif output for both. Note that in my test, case 1 grew from 2 to 6 MB and case 2 grew from 1 to 3 MB, over the course of a few hours.
Case 1:
->08.50% (6,239,857B) 0x50E9A87: std::string::_Rep::_S_create(unsigned long, unsigned long, std::allocator<char> const&) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.16) | ->08.45% (6,208,357B) 0x50EA7F9: std::string::_Rep::_M_clone(std::allocator<char> const&, unsigned long) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.16) | | ->08.45% (6,208,323B) 0x50EA8DE: std::string::reserve(unsigned long) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.16) | | | ->08.44% (6,200,449B) 0x50EAE0B: std::string::append(std::string const&) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.16) | | | | ->08.44% (6,199,973B) 0x66A933: mongo::mr::MapReduceFinishCommand::run(std::string const&, mongo::BSONObj&, int, std::string&, mongo::BSONObjBuilder&, bool) (basic_string.h:2310) | | | | | ->08.44% (6,199,973B) 0x67DBFD: mongo::_execCommand(mongo::Command*, std::string const&, mongo::BSONObj&, int, mongo::BSONObjBuilder&, bool) (dbcommands.cpp:1859) | | | | | ->08.44% (6,199,973B) 0x67E684: mongo::execCommand(mongo::Command*, mongo::Client&, int, char const*, mongo::BSONObj&, mongo::BSONObjBuilder&, bool) (dbcommands.cpp:1985) | | | | | ->08.44% (6,199,973B) 0x67F58B: mongo::_runCommands(char const*, mongo::BSONObj&, mongo::_BufBuilder<mongo::TrivialAllocator>&, mongo::BSONObjBuilder&, bool, int) (dbcommands.cpp:2069) | | | | | ->08.44% (6,199,973B) 0x74CFC8: mongo::runCommands(char const*, mongo::BSONObj&, mongo::CurOp&, mongo::_BufBuilder<mongo::TrivialAllocator>&, mongo::BSONObjBuilder&, bool, int) (query.cpp:43) | | | | | ->08.44% (6,199,973B) 0x750AAD: mongo::runQuery(mongo::Message&, mongo::QueryMessage&, mongo::CurOp&, mongo::Message&) (query.cpp:920) | | | | | ->08.44% (6,199,973B) 0x6FB166: mongo::assembleResponse(mongo::Message&, mongo::DbResponse&, mongo::HostAndPort const&) (instance.cpp:244) | | | | | ->08.44% (6,199,973B) 0x592777: mongo::MyMessageHandler::process(mongo::Message&, mongo::AbstractMessagingPort*, mongo::LastError*) (db.cpp:193) | | | | | ->08.44% (6,199,973B) 0x90DDC9: mongo::pms::threadRun(mongo::MessagingPort*) (message_server_port.cpp:85) | | | | | ->08.44% (6,199,973B) 0x4E35E98: start_thread (pthread_create.c:308) | | | | | ->08.44% (6,199,973B) 0x5950CCB: clone (clone.S:112)
Second (looks like stats logging for temporary MR namespaces?):
->04.73% (3,470,360B) 0x5A8F4E: std::_Rb_tree<std::string, std::string, std::_Identity<std::string>, std::less<std::string>, std::allocator<std::string> >::_M_insert_(std::_Rb_tree_node_base const*, std::_Rb_tree_node_base const*, std::string const&) (new_allocator.h:92) | ->04.73% (3,470,120B) 0x5BAA50: std::_Rb_tree<std::string, std::string, std::_Identity<std::string>, std::less<std::string>, std::allocator<std::string> >::_M_insert_unique(std::string const&) (stl_tree.h:1291) | | ->04.73% (3,470,080B) 0x8B5BA7: mongo::ClientConnections::_check(std::string const&) (stl_set.h:410) | | | ->04.73% (3,470,080B) 0x8B5C28: mongo::ClientConnections::get(std::string const&, std::string const&) (shardconnection.cpp:149) | | | ->04.73% (3,470,080B) 0x8B4142: mongo::ShardConnection::_init() (shardconnection.cpp:323) | | | ->04.73% (3,470,080B) 0x8B420C: mongo::ShardConnection::ShardConnection(std::string const&, std::string const&, boost::shared_ptr<mongo::ChunkManager const>) (shardconnection.cpp:316) | | | ->04.73% (3,470,080B) 0x5F8E86: mongo::ParallelSortClusteredCursor::_oldInit() (parallel.cpp:1377) | | | ->04.73% (3,470,080B) 0x669753: mongo::mr::MapReduceFinishCommand::run(std::string const&, mongo::BSONObj&, int, std::string&, mongo::BSONObjBuilder&, bool) (mr.cpp:1308) | | | ->04.73% (3,470,080B) 0x67DBFD: mongo::_execCommand(mongo::Command*, std::string const&, mongo::BSONObj&, int, mongo::BSONObjBuilder&, bool) (dbcommands.cpp:1859) | | | ->04.73% (3,470,080B) 0x67E684: mongo::execCommand(mongo::Command*, mongo::Client&, int, char const*, mongo::BSONObj&, mongo::BSONObjBuilder&, bool) (dbcommands.cpp:1985) | | | ->04.73% (3,470,080B) 0x67F58B: mongo::_runCommands(char const*, mongo::BSONObj&, mongo::_BufBuilder<mongo::TrivialAllocator>&, mongo::BSONObjBuilder&, bool, int) (dbcommands.cpp:2069) | | | ->04.73% (3,470,080B) 0x74CFC8: mongo::runCommands(char const*, mongo::BSONObj&, mongo::CurOp&, mongo::_BufBuilder<mongo::TrivialAllocator>&, mongo::BSONObjBuilder&, bool, int) (query.cpp:43) | | | ->04.73% (3,470,080B) 0x750AAD: mongo::runQuery(mongo::Message&, mongo::QueryMessage&, mongo::CurOp&, mongo::Message&) (query.cpp:920) | | | ->04.73% (3,470,080B) 0x6FB166: mongo::assembleResponse(mongo::Message&, mongo::DbResponse&, mongo::HostAndPort const&) (instance.cpp:244) | | | ->04.73% (3,470,080B) 0x592777: mongo::MyMessageHandler::process(mongo::Message&, mongo::AbstractMessagingPort*, mongo::LastError*) (db.cpp:193) | | | ->04.73% (3,470,080B) 0x90DDC9: mongo::pms::threadRun(mongo::MessagingPort*) (message_server_port.cpp:85) | | | ->04.73% (3,470,080B) 0x4E35E98: start_thread (pthread_create.c:308) | | | ->04.73% (3,470,080B) 0x5950CCB: clone (clone.S:112)
- is related to
-
SERVER-8442 Map-reduce memory leak
- Closed