-
Type: Task
-
Resolution: Won't Fix
-
Priority: Major - P3
-
None
-
Affects Version/s: None
-
Component/s: Replication, Sharding
-
None
-
Sharding 2018-08-13, Sharding 2018-08-27
In order to fix SERVER-35367, we need to cause queries on the oplog to yield their locks while calling waitForAllEarlierOplogWritesToBeVisible(). Yielding locks doesn't work if there are nested lock acquisitions on the Global lock. Since _getNextSessionMods takes a lock on the collection being migrated and holds that lock while doing a query against the oplog, that means that locks taken for the oplog query result in a nested acquisition of the Global lock, preventing the lock yielding and resulting in waitForAllEarlierOplogWritesToBeVisible() to be called while locks are held.
The lock on the collection being migrated is only required to figure out which is the starting oplog entry for walking back the oplog chain for the transaction. Once that has been decided there's no intrinsic reason why we need to maintain the collection lock while querying the oplog. The problem is that the logic for walking back the oplog chain lives in the MigrationChunkClonerSource, whose lifetime is guarded by the collection lock on the collection being migrated, which makes this difficult to work around without substantial code changes.
- is depended on by
-
SERVER-35367 Hold locks in fewer callers of waitForAllEarlierOplogWritesToBeVisible()
- Closed