-
Type: Bug
-
Resolution: Unresolved
-
Priority: Major - P3
-
None
-
Affects Version/s: None
-
Component/s: None
-
None
-
Storage Execution
-
ALL
-
Execution Team 2024-12-23
Risk assessment
- No data loss
- Potential for expired data to be kept around:
- In production, _minBytesPerMarker is large, and the race only occurs when multiple updates try to create a new marker at the same time.
- In practice, it's not likely that the last inserts into CollectionTruncateMarkersWithPartialExpiration will be at the same time as marker creation, with no additional inserts.
- Without additional inserts, the pre-images will become expirable after restart
Summary
CollectionTruncateMarkersWithPartialExpiration record expiry relies on accurate:
- Accurate _highestRecordId (*WithPartialExpiration subclass specific)
- Accurate _highestWallTime (*WithPartialExpiration specific)
- Non-zero _currentRecords (private CollectionTruncateMarkers member)
- Non-zero _currentBytes (private CollectionTruncateMarkers member)
- Note: CollectionTruncateMarkersWithPartialExpiration is a friend of CollectionTruncateMarkers, so has access to its privates.
The _highestRecordId and _highestWallTime are protected by a _highestRecordMutex, while the _currentRecords and _currentBytes are atomic. Having concurrency for the fields split between the parent and subclass is not only difficult to logic about, but can prevent expiration of the record associated with the _highestRecordId and _highestWallTime until the next insert.
Given:
- 2 Threads, T1, T2
- We have records recA and recB
- Let recA < recB with respect to RecordId and WallTime.
- For simplicity of the example, assume each record is 1 byte.
- _currentRecords and _currentBytes are 1:1, and referred to as _current*
- _highestRecord and _highestWallTime reflect a single record at a given time, and are referred to as _highest*
- _minBytesPerMarker = 1
Suppose
- T1 and T2 issue CollectionTruncateMarkersWithPartialExpiration::updateCurrentMarker() at approximately the same time.
- T1 tracks recA, T2 tracks recB
- T2 updates _updateHighestSeenRecordIdAndWallTime and _current* for recB
- _highest* = recB, _current* = 1, newCurrentBytes = 1
- T1 updates _updateHighestSeenRecordIdAndWallTime and _current* for recA
- _highest* recB still, _current* = 2, newCurrentBytes = 2
- Both see newCurrentBytes >= _minBytesPerMarker, and issue CollectionTruncateMarkers::createNewMarkerIfNeeded() with 'highestRecordId' and 'highestWallTime' as the parameters passed into updateCurrentMarker() not the _highest member variables.*
- T1 wins the _markersMutex in CollectionTruncateMarkers::createNewMarkerIfNeeded(), and is tracking 'lastRecord' and 'wallTime' for recA
- T2 gives up trying to create a new marker; it sees another thread has the _markersMutex.
- T1 sees _current* as non-zero, and empty truncate markers, so it issues CollectionTruncateMarkers::createNewMarker(recA)
- CollectionTruncateMarkers::createNewMarker() swaps _current* to 0, and creates a new whole marker with lastRecord and wallTime for recA
- The state:
- whole marker with upper bound of recA
- _highest* is recB
- _current* is 0, so recB, tracked by _highest* cannot upgrade to a whole, expirable marker. recB can't be expired until a new update increases the _current*.
- related to
-
SERVER-90705 Preimage truncate marker refresh does not drop concurrently inserted docs
- Closed