ReplicationCoordinatorExternalStateImpl::waitForAllEarlierOplogWritesToBeVisible() holds a collection lock on the oplog while doing a blocking wait. This can cause a hang described below:
1. First, perform an insert into a replicated collection using insertDocuments(). An optime is generated, but not committed. If another write occurs after this at a later optime, a "hole" is created by the timestamped write is that is not yet committed.
2. A reader using readConcern "atClusterTime" or "afterClusterTime" begins a read. This uses ReplicationCoordinatorExternalStateImpl::waitForAllEarlierOplogWritesToBeVisible() to wait for all uncommitted operations to become committed and visible.
- This waits for the uncommitted insert in step 1 to be commited while holding a DBLock("local", MODE_IS)
3. A dropCollection command is received on the "local" database, and enqueues a DBLock("local", MODE_X).
4. The first insert completes the insert in the storage engine and attempts to write the oplog entry at the generated optime. It attempts to acquire a DBLock("local", MODE_IX).
- The previously enqueued dropCollection operation prevents the insert from acquiring the "local" database lock.
- waitForAllEarlierOplogWritesToBeVisible() holds its collection lock while waiting the insert to become visible, which waits behind the dropCollection operation
This method should be redesigned so that a collection lock is not required to be held while waiting for the last oplog entry to become visible.
- causes
-
SERVER-37048 Hold global intent lock whenever accessing the oplog collection pointer
- Closed
- depends on
-
SERVER-36508 _getNextSessionMods command should not hold locks on migration collection while querying the oplog
- Closed
- is related to
-
SERVER-35365 MapReduce temporary inc collections should be written to the local database
- Closed
-
SERVER-36514 Hold lock on oplog as soon as optime is reserved
- Closed
-
SERVER-36534 Don't acquire locks on oplog when writing oplog entries
- Closed
- related to
-
SERVER-40498 Writing transaction oplog entries must not take locks while holding an oplog slot
- Closed