Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-82617

Router's fsyncLock command must be resilient to elections

    • Cluster Scalability
    • ALL

      When fsyncLock is invoked on a router, it contacts the primary of every shard and makes sure there are no ongoing DDLs in order not to incur in inconsistencies during backups. This protocol is currently not resilient to elections.

      Example of breaking scenario

      Let's consider a shard with 3 nodes: n0, n1 and n2. The primary was n0 but just switched to n1.

      1. The router believes n0 is primary, asks to acquire the fsync lock
      2. Since the command is allowed on secondaries, n0 acquires the lock and returns successfully
      3. A DDL starts on n1 since the coordinator document can be majority committed replicating to n2
      4. Backup starts from n1 or n2

            Assignee:
            Unassigned Unassigned
            Reporter:
            pierlauro.sciarelli@mongodb.com Pierlauro Sciarelli
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated: