SERVER-34798 requires all the clients to be destroyed before the destruction of ServiceContext. However, WiredTigerCheckpointThread destroys its client asynchronously and could have a race condition with the main thread because in background.cpp:
{ // It is illegal to access any state owned by this BackgroundJob after leaving this // scope, with the exception of the call to 'delete this' below. stdx::unique_lock<stdx::mutex> l(_status->mutex); _status->state = Done; _status->done.notify_all(); } if (selfDelete) delete this; }
We set the state to be "Done" before the thread_local client gets destroyed because the thread is still running. But setting the state to be "Done" and notifying would unblock the main thread which could go all the way to the destructor of ServiceContext. Therefore, we could have a situation where the client of WTCheckpointThread gets destroyed by its thread after ServiceContext gets destroyed by main thread.
The way to reproduce BF-10032 is adding a big sleep here.
The fix should be similar to SERVER-35985: Add a ON_BLOCK_EXIT in the run() function of WTCheckPointThread
We should check other BackgroundJobs which create clients in their run() function.
- is related to
-
SERVER-35985 sessions_test and sharding_catalog_manager_test don't destroy all Clients before destroying the ServiceContext
- Closed
-
SERVER-34798 Replace subclasses of ServiceContext with decorations and flexible initialization code
- Closed
-
SERVER-36473 Make a dedicated RAII class to manage Client lifetime
- Closed