-
Type: Bug
-
Resolution: Fixed
-
Priority: Major - P3
-
Affects Version/s: None
-
Component/s: None
-
Fully Compatible
-
ALL
-
v4.4
-
Service arch 2020-06-29, Service arch 2020-07-13, Service Arch 2020-07-27, Service Arch 2020-08-10
:tldr
The ServiceContext's NetworkInterfaceMockClockSource cannot continue to depend upon the lifetime of the ReplicationCoordinator to remain valid; while also being depended upon by any code with a mutex under contention via the DiagnosticInfo code.
There's a rare (so far) BF dependent on a SERVER ticket linked a couple hops from this one.
--------------------------------------------------------------------------
I tried to setup the JournalFlusher thread in our unit test test fixtures, specifically the ServiceContexMongoDTest test fixture. However, I ran into problems like so,
[cpp_unit_test:db_repl_coordinator_test] 2020-06-05T20:47:57.003+0000 | 2020-06-05T20:47:57.003Z I STORAGE 22320 [main] "Shutting down journal flusher thread" [cpp_unit_test:db_repl_coordinator_test] 2020-06-05T20:47:57.004+0000 src/mongo/executor/network_interface_mock.h:567:22: runtime error: member call on address 0x61600038ee80 which does not point to an object of type 'mongo::executor::NetworkInterfaceMock' [cpp_unit_test:db_repl_coordinator_test] 2020-06-05T20:47:57.004+0000 0x61600038ee80: note: object has a possibly invalid vptr: abs(offset to top) too big [cpp_unit_test:db_repl_coordinator_test] 2020-06-05T20:47:57.004+0000 a4 00 00 48 8b 00 00 49 4d 56 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [cpp_unit_test:db_repl_coordinator_test] 2020-06-05T20:47:57.004+0000 ^~~~~~~~~~~~~~~~~~~~~~~ [cpp_unit_test:db_repl_coordinator_test] 2020-06-05T20:47:57.004+0000 possibly invalid vptr [cpp_unit_test:db_repl_coordinator_test] 2020-06-05T20:48:06.642+0000 #0 0x564d4e99e90a in mongo::executor::NetworkInterfaceMockClockSource::now() /data/mci/c67a30324c4da5f0e5f82678065dd299/src/src/mongo/executor/network_interface_mock.h:567:22 [cpp_unit_test:db_repl_coordinator_test] 2020-06-05T20:48:06.652+0000 #1 0x564d512227b1 in mongo::DiagnosticInfo::capture(mongo::StringData const&, mongo::DiagnosticInfo::Options) /data/mci/c67a30324c4da5f0e5f82678065dd299/src/src/mongo/util/diagnostic_info.cpp:285:73 [cpp_unit_test:db_repl_coordinator_test] 2020-06-05T20:48:06.652+0000 #2 0x564d512250a9 in mongo::(anonymous namespace)::_mongoInitializerFunction_DiagnosticInfo(mongo::InitializerContext*)::DiagnosticListener::onContendedLock(mongo::latch_detail::Identity const&) /data/mci/c67a30324c4da5f0e5f82678065dd299/src/src/mongo/util/diagnostic_info.cpp:192:43 [cpp_unit_test:db_repl_coordinator_test] 2020-06-05T20:48:06.653+0000 #3 0x564d532adfe6 in mongo::latch_detail::Mutex::_onContendedLock() /data/mci/c67a30324c4da5f0e5f82678065dd299/src/src/mongo/platform/mutex.cpp:89:19 [cpp_unit_test:db_repl_coordinator_test] 2020-06-05T20:48:06.653+0000 #4 0x564d532ad29e in mongo::latch_detail::Mutex::lock() /data/mci/c67a30324c4da5f0e5f82678065dd299/src/src/mongo/platform/mutex.cpp:55:5 [cpp_unit_test:db_repl_coordinator_test] 2020-06-05T20:48:06.678+0000 #5 0x564d4e36c544 in std::unique_lock<mongo::latch_detail::Latch>::lock() /opt/mongodbtoolchain/revisions/e5348beb43e147b74a40f4ca5fb05a330ea646cf/stow/gcc-v3.U0D/lib/gcc/x86_64-mongodb-linux/8.2.0/../../../../include/c++/8.2.0/bits/std_mutex.h:267:17 [cpp_unit_test:db_repl_coordinator_test] 2020-06-05T20:48:06.678+0000 #6 0x564d4e36beba in std::unique_lock<mongo::latch_detail::Latch>::unique_lock(mongo::latch_detail::Latch&) /opt/mongodbtoolchain/revisions/e5348beb43e147b74a40f4ca5fb05a330ea646cf/stow/gcc-v3.U0D/lib/gcc/x86_64-mongodb-linux/8.2.0/../../../../include/c++/8.2.0/bits/std_mutex.h:197:2 [cpp_unit_test:db_repl_coordinator_test] 2020-06-05T20:48:06.682+0000 #7 0x564d4fe2d1de in mongo::JournalFlusher::run() /data/mci/c67a30324c4da5f0e5f82678065dd299/src/src/mongo/db/storage/control/journal_flusher.cpp:127:34 [cpp_unit_test:db_repl_coordinator_test] 2020-06-05T20:48:06.685+0000 #8 0x564d52b09f7d in mongo::BackgroundJob::jobBody() /data/mci/c67a30324c4da5f0e5f82678065dd299/src/src/mongo/util/background.cpp:160:5 [cpp_unit_test:db_repl_coordinator_test] 2020-06-05T20:48:06.685+0000 #9 0x564d52b12375 in std::__invoke_result<mongo::BackgroundJob::go()::$_0>::type std::__invoke<mongo::BackgroundJob::go()::$_0>(mongo::BackgroundJob::go()::$_0&&) /opt/mongodbtoolchain/revisions/e5348beb43e147b74a40f4ca5fb05a330ea646cf/stow/gcc-v3.U0D/lib/gcc/x86_64-mongodb-linux/8.2.0/../../../../include/c++/8.2.0/bits/invoke.h:95:14 [cpp_unit_test:db_repl_coordinator_test] 2020-06-05T20:48:06.685+0000 #10 0x564d52b12375 in decltype(auto) std::__apply_impl<mongo::BackgroundJob::go()::$_0, std::tuple<> >(mongo::BackgroundJob::go()::$_0&&, std::tuple<>&&, std::integer_sequence<unsigned long>) /opt/mongodbtoolchain/revisions/e5348beb43e147b74a40f4ca5fb05a330ea646cf/stow/gcc-v3.U0D/lib/gcc/x86_64-mongodb-linux/8.2.0/../../../../include/c++/8.2.0/tuple:1678 [cpp_unit_test:db_repl_coordinator_test] 2020-06-05T20:48:06.685+0000 #11 0x564d52b12375 in decltype(auto) std::apply<mongo::BackgroundJob::go()::$_0, std::tuple<> >(mongo::BackgroundJob::go()::$_0&&, std::tuple<>&&) /opt/mongodbtoolchain/revisions/e5348beb43e147b74a40f4ca5fb05a330ea646cf/stow/gcc-v3.U0D/lib/gcc/x86_64-mongodb-linux/8.2.0/../../../../include/c++/8.2.0/tuple:1687 [cpp_unit_test:db_repl_coordinator_test] 2020-06-05T20:48:06.685+0000 #12 0x564d52b12375 in mongo::stdx::thread::thread<mongo::BackgroundJob::go()::$_0, 0>(mongo::BackgroundJob::go()::$_0)::'lambda'()::operator()() /data/mci/c67a30324c4da5f0e5f82678065dd299/src/src/mongo/stdx/thread.h:186 [cpp_unit_test:db_repl_coordinator_test] 2020-06-05T20:48:06.685+0000 #13 0x564d52b12375 in mongo::BackgroundJob::go()::$_0 std::__invoke_impl<void, mongo::stdx::thread::thread<mongo::BackgroundJob::go()::$_0, 0>(mongo::BackgroundJob::go()::$_0)::'lambda'()>(std::__invoke_other, mongo::stdx::thread::thread<mongo::BackgroundJob::go()::$_0, 0>(mongo::BackgroundJob::go()::$_0)::'lambda'()&&) /opt/mongodbtoolchain/revisions/e5348beb43e147b74a40f4ca5fb05a330ea646cf/stow/gcc-v3.U0D/lib/gcc/x86_64-mongodb-linux/8.2.0/../../../../include/c++/8.2.0/bits/invoke.h:60 [cpp_unit_test:db_repl_coordinator_test] 2020-06-05T20:48:06.686+0000 #14 0x564d535a52ee in execute_native_thread_routine /data/mci/ff40e18d703ff1f6e8f8ecf75ad7202f/toolchain-builder/tmp/build-gcc-v3.sh-NJq/build/x86_64-mongodb-linux/libstdc++-v3/src/c++11/../../../../../src/combined/libstdc++-v3/src/c++11/thread.cc:80:18 [cpp_unit_test:db_repl_coordinator_test] 2020-06-05T20:48:06.686+0000 #15 0x7f8e0e26b6da in start_thread (/lib/x86_64-linux-gnu/libpthread.so.0+0x76da) [cpp_unit_test:db_repl_coordinator_test] 2020-06-05T20:48:09.525+0000 #16 0x7f8e0dd7c88e in clone /build/glibc-OTsEL5/glibc-2.27/misc/../sysdeps/unix/sysv/linux/x86_64/clone.S:95 [cpp_unit_test:db_repl_coordinator_test] 2020-06-05T20:48:09.525+0000 [cpp_unit_test:db_repl_coordinator_test] 2020-06-05T20:48:09.525+0000 SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior src/mongo/executor/network_interface_mock.h:567:22 in
It turns out that a contend mutex acquisition goes into the DiagnosticInfo code, which in turn call into the NetworkInterfaceMockClockSource and tries to use the NetworkInterfaceMock. However, the JournalFlusher (which has a mutex) found the NetworkInterfaceMock to be invalid memory: the NetworkInterfaceMock was already destroyed.
The test fixture of the failing test is ReplCoordTest. The repl test fixture creates a NetworkInterfaceMock instances, passes ownership of the NetworkInterface into a ThreadPoolTaskExecutor instance, whose ownership is then passed into a ReplicationCoordinatorImpl instance. Meanwhile, NetworkInterfaceMockClockSource instances are created, with pointers to the same NetworkInterfaceMock owned by the ReplicationCoordinatorImpl, and set on the ServiceContext. So things like the JournalFlusher that use Mutexes are now dependent on the the lifetime of the ReplicationCoordinatorImpl.
The ServiceContext's NetworkInterfaceMockClockSource cannot continue to depend upon the lifetime of the ReplicationCoordinator to remain valid; while also being depended upon by any code with a mutex under contention.
- causes
-
SERVER-63103 ClockSourceMock::get has no implementation
- Closed
- is depended on by
-
SERVER-46826 Instantiate the JournalFlusher thread for ephemeral engines and when non-durable (nojournal=true)
- Closed
- is related to
-
SERVER-49863 ServiceContext time sources should either be all mocked or all unmocked
- Backlog
-
SERVER-49864 Tests with ServiceContext use should be ServiceContextTests
- Closed
- related to
-
SERVER-47892 DiagnosticInfo for latches doesn't integrate well with unit-tests which create and destroy ServiceContexts
- Closed
-
SERVER-49950 Address issues with ClockSource::setAlarm() and ClockSourceMock::_processAlarms()
- Closed