-
Type: Bug
-
Resolution: Unresolved
-
Priority: Major - P3
-
None
-
Affects Version/s: None
-
Component/s: None
-
None
-
Storage Execution
-
ALL
-
-
Execution Team 2024-08-19
If there are greater than TTLIndexDeleteTargetDocs expired orphan (unowned) documents for a given TTL index, more recently expired documents cannot be removed by the TTLMonitor through the index.
Details
The TTLMonitor uses batched deletes by default. The batched delete stage first 'stages' documents in a buffer until _batchTargetMet().
The batch target is met if either the 'targetBatchDocs' are stored in the buffer, or more than 'targetStagedDocBytes' are stored in the buffer. However, documents in the buffer can be orphans.
Once the batch target is met, we try to commit the batch. Since the TTLMonitor doesn't remove orphans, orphan documents are 'skipped' and not issued a delete. If all staged deletes were 'successful' (or skipped), and the buffer is cleared.
If the buffer is empty, and _passStagingComplete, isEOF() is true, and the BatchedDeleteStage returns EOF. If _passTargetMet() is true, _passStagingComplete is true. _passTargetMet() is true if the total number of documents staged (this can include orphans) across batches exceeds '_batchedDeleteParams->targetPassDocs'. The TTLMonitor sets 'targetPassDocs' to TTLIndexDeleteTargetDocs.
If there are more than TTLIndexDeleteTargetDocs that are (1) orphans and (2) expired, the TTLMonitor will repeatedly try to issue the same batch delete with no delete progress. The TTLMonitor can't recover until the orphan documents are cleaned up.
The issue isn't specific to orphans. It can also manifest when a received chunk has expired documents, but the chunk hasn't been committed to the shard yet.
The issue isn't specific to orphans on a donor shard. Expired orphan documents on a recipient shard, which belong to a chunk that has yet to be committed, can also block TTL delete progress.