-
Type: Bug
-
Resolution: Done
-
Priority: Major - P3
-
None
-
Affects Version/s: 3.0.0, 3.2.0, 3.4.0
-
Component/s: Replication
-
Replication
-
ALL
Both the cloner and oplog fetcher in replication initial sync use a cursor with no timeout:
2016-11-03T19:58:56.081+0000 I COMMAND [conn47601] command buildlogs.logs command: find { find: "logs", noCursorTimeout: true, batchSize: 13981010 } planSummary: COLLSCAN cursorid:45904553724 keysExamined:0 docsExamined:822 numYields:14 nreturned:821 reslen:16750452 locks:{ Global: { acquireCount: { r: 30 } }, Database: { acquireCount: { r: 15 } }, Collection: { acquireCount: { r: 15 } } } protocol:op_command 447ms
While both these components have graceful shutdown and clean up the cursors that they open, in case of network failure or crash of a secondary node, these cursors will be leaked and never get cleaned up.
This is especially problematic with replica set shards, because having a cursor open on a sharded collection will eventually block migrations to that shard:
2016-11-09T16:09:06.572+0000 I SHARDING [RangeDeleter] waiting for open cursors before removing range [{ build_id: "337bc5b6432ea606a010e4c95a5e5f9a", test_id: ObjectId('57f3eb919041302d8b03ffdf'), seq: 1 }, { build_id: "337c88bdf0f88e7c95d9ba482d042e71", test_id: ObjectId('57d1b969be07c42b9805e57f'), seq: 2 }) in buildlogs.logs, elapsed secs: 499819, cursor ids: [45904553724]
- related to
-
SERVER-6036 Disable cursor timeout for cursors that belong to a session
- Closed
-
SERVER-31688 W SHARDING [conn161595] can't accept new chunks because there are still 1 deletes from previous migration
- Closed