-
Type:
Task
-
Resolution: Unresolved
-
Priority:
Major - P3
-
None
-
Affects Version/s: None
-
Component/s: None
-
None
-
Catalog and Routing
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Problem description
We currently allow writing on orphaned buckets collections (SERVER-73901). In other words... for legacy timeseries collections when the system.buckets collection exists but the associated view doesn't we still allow write targeting the main timeseries namespace.
In fact, in the write path if we find the system.buckets collection without the associated view we still perform namespace translation from view to system.buckets and we write to it.
In sharded clusters this behavior is risky, in fact, it may happen that due to other potential bugs in the system, the shard receives a non-versioned request on the main timeseries namespace (view). In this case we may end-up writing to the underlying system.buckets collection bypassing completely the shard version check even if the collection is actually tracked.
Context
Replicaset
On replicaset the view for timeseries should always be present so we should never translate to system.buckets collection when the view is not present.
One special case to take into account here is $out that only creates the system.buckets collection initially and only at the end it creates the view namespace. For this case I believe $out is directly targeting the system.buckets namespace so there is no need to translate the namespace.
So again I think we should never jump to the system.buckets if the view does not exist.
Sharded clusters
For sharded cluster, there are 3 cases we should consider:
Untracked collection
For untracked collection we should apply the same reasoning of the replicaset case above. The request should always target the DB primary shard that has the view definition and thus we should never translate the namespace if the view does not exists.
Tracked collections
If the collection is tracked it could be on any shards, if it happens to be on a shard that is not the db primary the view won't be present.
When the collection is tracked in the global catalog the router must always perform the translation and send the request to the data shards with the following:
- The target namespace should be system.buckets (already translated)
- The isTimeseriesNamespace flag is set to true (to indicate the shard that the translation was already performed)
- The ShardVersion for the system.buckets namespace must be attached to the request.
Also, in this case, we should never translate the namespace to system.buckets on the shard when the view definition does not exist.
Direct shard connection
In case the db admin wants to connect to a non db primary shard and read/write logical timeseries data we should translate the namespace to system.buckets automatically even if the view does not exist.
Conclusions
In general we should never perform namespace translation for orphaned legacy timeseries collections. This means that if only the system.buckets collection exists but the associated view does we should simply reject the writes or through an error.
There should be only one special use-case in which we want to perform the namespace translation for legacy timeseries even if the view does not exist. This is to support the "Direct shard connection" case. Apart from this "special" case we should never do it.
- is related to
-
SERVER-73901 Time-series inserts succeed even if time-series view does not exist
-
- Closed
-
- related to
-
SERVER-73901 Time-series inserts succeed even if time-series view does not exist
-
- Closed
-