-
Type: Improvement
-
Resolution: Unresolved
-
Priority: Major - P3
-
None
-
Affects Version/s: None
-
Component/s: Aggregation Framework
-
Query Optimization
-
Query 2021-01-25, Query Optimization 2021-05-03
-
(copied to CRM)
When creating a change stream on an unsharded collection, we currently open cursors on all shards rather than just the primary shard. We do so because, at the point where we are deciding upon a routing strategy, we have no insight into whether the operation is creating a new stream or resuming from a point in the past; in order to cover all possible scenarios, we must therefore assume the latter. This in turn means that we may be resuming from a time when the namespace existed in an earlier incarnation as a sharded collection which has since been dropped, and because we do not have access to a historical view of the routing table, we must again assume that this is the case. This defensive series of assumptions lets us open streams which work across all potential use-cases, at the cost of significant inefficiency and some unfortunate edge-cases (e.g. SERVER-42232) in the case of unsharded collections.
But by far the more likely scenario is that we are starting or resuming a stream on an unsharded collection which does currently exist, in which case opening streams on every shard is wasteful and opens the possibility of complications such as SERVER-42232. We already resolve the UUID of unsharded collections as part of the aggregation processing code, so by providing a means for the routing logic to examine the stream's resume information, it should be possible to target the primary shard in cases where we are opening a new stream or resuming a stream on an existing unsharded collection.
- is related to
-
SERVER-78321 MongoDB 6.0: Adding a new shard renders all preceding resume tokens invalid
- Closed
-
SERVER-30784 Allow sharded change streams to target just the shards that have chunks
- Backlog
-
SERVER-80427 Avoid change streams latency caused by lack of writes on a shard
- Backlog