moveCollection and unshardCollection configure the resharding machinery to use numInitialChunks == 1 which runs a {$sample: {size: 10}} aggregation on the source collection. This is mostly harmless but for capped collections it can fail with CappedPositionLost due to the sampling being concurrent with the capped truncation. We should consider having numInitialChunks == 1 skip using SamplingBasedSplitPolicy and to instead initialize the single [MinKey(), MaxKey()] chunk range directly.
[j0:s1:prim] | 2024-05-29T10:20:14.412+01:00 E ASSERT 4457000 [S] [Balancer] "Tripwire assertion","attr":{"error":{"code":8959500,"codeName":"Location8959500","errmsg":"An unexpected error occured while moving a random unsharded collection, from: shard-rs0, to: config, nss: test15_fsmdb0.create_capped_collection_maxdocs2_1, error: CappedPositionLost: Command request failed on source shard. :: caused by :: PlanExecutor error during aggregation :: caused by :: CollectionScan died due to position in capped collection being deleted. Last seen record id: RecordId(788)"},"location":"{fileName:\"src\\mongo\\db\\s\\balancer\\move_unsharded_policy.cpp\", line:292, functionName:\"applyActionResult\"}"}
- is related to
-
SERVER-92240 Make balancer use explicit shard distribution when moving unsharded collections
- Closed