-
Type: Bug
-
Resolution: Unresolved
-
Priority: Major - P3
-
None
-
Affects Version/s: None
-
Component/s: None
-
None
-
Cluster Scalability
-
ALL
-
Cluster Scalability Priorities
-
(copied to CRM)
Recipients in a chunk migration use a singleton class that reuses a CancellationSource for each chunk receipt, using it to schedule tasks to run if the migration is interrupted, as of SERVER-65947. This includes creating CancelableOperationContexts for cloning data, waiting for write concern, and most notably copying session history, which creates a CancelableOperationContext per oplog entry received from the donor. CancelableOperationContext will create a future that is kept alive until the CancellationSource of its given token is destructed, and the migration recipient code only resets its token when stepping up as primary, so cancellation futures created during chunk receipts are kept in memory until the node steps down and back up or restarts. If a large number of session entries are migrated or there are a large number of migrations, this can use significant memory.
Instead, we should at least reset the CancellationSource at the end of each migration to avoid indefinite memory growth and possibly create a fresh sub CancellationSource just the duration of each processed batch of oplog entries (ie create a new one for each iteration of this loop), so copying a large number of sessions during one migration won't accumulate too much memory.
- is related to
-
SERVER-92333 Audit use of long lived CancellationSources
- Backlog
- related to
-
SERVER-65947 MigrationDestinationManager must recover if an error occurs during release of the critical section
- Closed