The WiredTigerKVEngine maintains a counter for how many WiredTigerRecordStores are "oplog-like" (namespace of the form local.oplog.*). When a record store with an oplog-like namespace is created, the counter is bumped. Similarly, when the record store for one of these collections is destroyed, the count is decreased. Only when the reference count hits zero do we join with the oplog visibility background thread.
~WTRecordStore()'s call to WiredTigerKVEngine::haltOplogManager() does not guarantee that the background oplog visibility thread has actually stopped. If there are other RecordStores with oplog-like namespaces, the oplog visibility thread will continue running. Unfortunately, this thread may hold a pointer to the WiredTigerRecordStore being destroyed, which means that in very rare circumstances, the background thread will read from freed memory.
If I am right, these events should trigger the bug:
1) Create collection local.oplog.a
2) <Oplog visibility thread now holds a pointer to the WiredTigerRecordStore for local.oplog.a>
3) Create collection local.oplog.b
4) Destroy WiredTigerRecordStore for local.oplog.a
5) Oplog visibility thread continues running, and dereferences its pointer to the now-destroyed WiredTigerRecordStore
As far as I can tell, the order in which we destroy in-memory state about collections on shutdown is not specified/guaranteed (based on my reading of this, which is just a loop over an unordered_map). My guess is that if there are multiple oplog-like collections, and the first one we destroy happens to be the one registered with the oplog visibility thread, there is a brief window during which the visibility thread can read unowned memory.
- related to
-
SERVER-47885 Have lookupCollectionBy{Namespace,UUID} return a shared_ptr
- Closed