Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-27009

Replication initial sync creates cursors with no timeout

    • Type: Icon: Bug Bug
    • Resolution: Done
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: 3.0.0, 3.2.0, 3.4.0
    • Component/s: Replication
    • Replication
    • ALL

      Both the cloner and oplog fetcher in replication initial sync use a cursor with no timeout:

      2016-11-03T19:58:56.081+0000 I COMMAND  [conn47601] command buildlogs.logs command: find { find: "logs", noCursorTimeout: true, batchSize: 13981010 } planSummary: COLLSCAN cursorid:45904553724 keysExamined:0 docsExamined:822 numYields:14 nreturned:821 reslen:16750452 locks:{ Global: { acquireCount: { r: 30 } }, Database: { acquireCount: { r: 15 } }, Collection: { acquireCount: { r: 15 } } } protocol:op_command 447ms
      

      While both these components have graceful shutdown and clean up the cursors that they open, in case of network failure or crash of a secondary node, these cursors will be leaked and never get cleaned up.

      This is especially problematic with replica set shards, because having a cursor open on a sharded collection will eventually block migrations to that shard:

      2016-11-09T16:09:06.572+0000 I SHARDING [RangeDeleter] waiting for open cursors before removing range [{ build_id: "337bc5b6432ea606a010e4c95a5e5f9a", test_id: ObjectId('57f3eb919041302d8b03ffdf'), seq: 1 }, { build_id: "337c88bdf0f88e7c95d9ba482d042e71", test_id: ObjectId('57d1b969be07c42b9805e57f'), seq: 2 }) in buildlogs.logs, elapsed secs: 499819, cursor ids: [45904553724]
      

            Assignee:
            backlog-server-repl [DO NOT USE] Backlog - Replication Team
            Reporter:
            kaloian.manassiev@mongodb.com Kaloian Manassiev
            Votes:
            1 Vote for this issue
            Watchers:
            9 Start watching this issue

              Created:
              Updated:
              Resolved: