Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-93867

Replica set can lack replicaSetId

    • Type: Icon: Bug Bug
    • Resolution: Unresolved
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • Replication
    • ALL

      Today, the settings.replicaSetId field is supposed to always be present in a replica set, per our documentation, but is defined in ReplSetConfigBase as an boost::optional<OID>. This is because users do not specify this field themselves, but instead we generate it at the time of calling replSetInitiate and persist it in the repl set config document at that time. From that point on, we never allow the replicaSetId to be changed via the replSetReconfig command.

      It's turned out it's possible for a replica set to lack this field if certain procedures e.g. backup + PIT restore are done. 

      Initially, my thinking was that backup/PIT restore should be amended to preserve the replicaSetId. However, matthew.russotto@mongodb.com brought up that conceptually it doesn't necessarily make sense to preserve the replicaSetId across a backup/restore, since it's a new cluster and this is supposed to be unique per cluster. That said, it may not be too problematic to reuse the replicaSetId in this case if we have to, since we don't think the case replicaSetId is supposed to protect against (SERVER-22287) is something that is likely to occur in Atlas.

      In doing this ticket we need to answer the following questions:

      • What is the best way to prevent this issue going forward? Is it a server change to repopulate the replicaSetId after a PIT restore has happened? an MMS change to preserve the existing replicaSetId or generate a new one when doing PIT restore? Something else?
      • What is the best way to try to resolve this issue in existing affected clusters? For example, could we check at runtime anytime a reconfig is performed if a cluster is lacking replicaSetId and add it back then? Note that to allow a graceful transition period, this might require e.g. relaxing the checks we have today that require both nodes to have the same replicaSetId (or both nodes to lack a replicaSetId) for messages between them to be accepted.
      • Should we add any new validation around the presence of replicaSetId?
      • Can we add warning logs about this situation?

            Assignee:
            Unassigned Unassigned
            Reporter:
            kaitlin.mahar@mongodb.com Kaitlin Mahar
            Votes:
            0 Vote for this issue
            Watchers:
            10 Start watching this issue

              Created:
              Updated: