Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-102891

Review legacy timeseries namespace translation in write path

    • Type: Icon: Task Task
    • Resolution: Unresolved
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • None
    • Catalog and Routing
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Problem description

      We currently allow writing on orphaned buckets collections (SERVER-73901). In other words... for legacy timeseries collections when the system.buckets collection exists but the associated view doesn't we still allow write targeting the main timeseries namespace.

      In fact,  in the write path if we find the system.buckets collection without the associated view we still perform namespace translation from view to system.buckets and we write to it.
      In sharded clusters this behavior is risky, in fact, it may happen that due to other potential bugs in the system, the shard receives a non-versioned request on the main timeseries namespace (view). In this case we may end-up writing to the underlying system.buckets collection bypassing completely the shard version check even if the collection is actually tracked.

      Context

      Replicaset

      On replicaset the view for timeseries should always be present so we should never translate to system.buckets collection when the view is not present.
      One special case to take into account here is $out that only creates the system.buckets collection initially and only at the end it creates the view namespace. For this case I believe $out is directly targeting the system.buckets namespace so there is no need to translate the namespace.
      So again I think we should never jump to the system.buckets if the view does not exist.

      Sharded clusters

      For sharded cluster, there are 3 cases we should consider:

      Untracked collection

      For untracked collection we should apply the same reasoning of the replicaset case above. The request should always target the DB primary shard that has the view definition and thus we should never translate the namespace if the view does not exists.

      Tracked collections

      If the collection is tracked it could be on any shards, if it happens to be on a shard that is not the db primary the view won't be present.
      When  the collection is tracked in the global catalog the router must always perform the translation and send the request to the data shards with the following:

      • The target namespace should be system.buckets (already translated)
      • The isTimeseriesNamespace flag is set to true (to indicate the shard that the translation was already performed)
      • The ShardVersion for the system.buckets namespace must be attached to the request.

       Also, in this case, we should never translate the namespace to system.buckets on the shard when the view definition does not exist.

      Direct shard connection

      In case the db admin wants to connect to a non db primary shard and read/write logical timeseries data we should translate the namespace to system.buckets automatically even if the view does not exist.

      Conclusions

      In general we should never perform namespace translation for orphaned legacy timeseries collections. This means that if only the system.buckets collection exists but the associated view does we should simply reject the writes or through an error.
      There should be only one special use-case in which we want to perform the namespace translation for legacy timeseries even if the view does not exist. This is to support the "Direct shard connection" case. Apart from this "special" case we should never do it.

            Assignee:
            Unassigned Unassigned
            Reporter:
            tommaso.tocci@mongodb.com Tommaso Tocci
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated:
              None
              None
              None
              None