Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-29604

Error when resuming change notifications through mongos if there's no exact match on the ResumeToken from any shard

    • Type: Icon: Task Task
    • Resolution: Duplicate
    • Priority: Icon: Minor - P4 Minor - P4
    • None
    • Affects Version/s: None
    • Component/s: Replication, Sharding
    • None
    • Replication

      When you resume change notifications in a sharded system, re-establishing the change notification cursor on each shard will error if the first entry with a resumeToken higher than the one given is the first entry in the oplog. This allows shards to detect if they have rolled off the back of the oplog. In the normal case this is sufficient to ensure that the resumed notification steam will not miss any entries.

      If, however, there was some more serious inconsistency in the system (for instance if a user had somehow manually deleted an oplog entry), or the driver had provided an invalid resumeToken that doesn't correspond to any valid oplog entry on any shard, we won't notice this and will resume notifications from whatever op has the closest resumeToken greater than the token provided to the command.

      In order to detect these cases, we could have the shard that finds the exact match on the resumeToken add an extra field to its first notification entry. The merge node could then expect exactly one of the shards to include that extra field and error if that is not the case. The merge node would also need to strip out that extra field so it isn't returned to the end user.

            Assignee:
            backlog-server-repl [DO NOT USE] Backlog - Replication Team
            Reporter:
            spencer@mongodb.com Spencer Brody (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: