Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-30789

Unable to Initial Sync Big Database on non-Linux System due to mongod TCP Keepalive Constraint

    • Type: Icon: Bug Bug
    • Resolution: Done
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: Replication
    • None
    • ALL

      We are initial syncing a big database (~70Gb) from a replica set member A to replica set member B. Both members are running mongod instance on Windows. Windows TCP connection interval is set to 30min.

      And from this post:
      https://docs.mongodb.com/v3.2/faq/diagnostics/
      We understand that MongoDB will set its own TCP timeout interval to 10min on Windows OS.

      Under these settings, we found we are not able to complete initial syncing, because:

      1. We need more than 10 minutes to build index after the big database is copied from A to B.
      2. Looks like MongoDB cannot ACK to TCP requests when building index
      3. Consequently after building index, instance B will receive a TCP connection timeout error, and need to start over the whole initial sync
      4. So stuck at this big database now.

      Please suggest.

      Log containing this error:

      2017-08-23T07:59:00.164+0800 I STORAGE  [rsSync] 14456650 objects cloned so far from collection DB.COL
      2017-08-23T07:59:04.007+0800 I STORAGE  [rsSync] clone DB.COL 14457727
      2017-08-23T07:59:53.315+0800 I INDEX    [rsSync] build index on: stratus.position properties: { v: 1, key: { _id: 1 }, name: "_id_", ns: "DB.COL" }
      2017-08-23T07:59:53.316+0800 I INDEX    [rsSync]         building index using bulk method; build may temporarily use up to 500 megabytes of RAM
      2017-08-23T07:59:56.019+0800 I -        [rsSync]   Index Build: 84400/14467992 0%
      2017-08-23T07:59:59.000+0800 I -        [rsSync]   Index Build: 195300/14467992 1%
      2017-08-23T08:00:02.002+0800 I -        [rsSync]   Index Build: 271400/14467992 1%
      ......
      2017-08-23T08:10:21.002+0800 I -        [rsSync]   Index Build: 14266400/14467992 98%
      2017-08-23T08:10:24.000+0800 I -        [rsSync]   Index Build: 14325000/14467992 99%
      2017-08-23T08:10:27.002+0800 I -        [rsSync]   Index Build: 14408000/14467992 99%
      2017-08-23T08:10:38.095+0800 I INDEX    [rsSync] build index done.  scanned 14467992 total records. 644 secs
      2017-08-23T08:10:38.104+0800 I REPL     [rsSync] initial sync data copy, starting syncup
      2017-08-23T08:10:38.106+0800 I REPL     [rsSync] oplog sync 1 of 3
      2017-08-23T08:10:38.109+0800 I NETWORK  [rsSync] Socket  send() errno:10054 An existing connection was forcibly closed by the remote host. IP:port
      2017-08-23T08:10:38.114+0800 I REPL     [rsSync] connection lost to hostname:port; is your tcp keepalive interval set appropriately?
      2017-08-23T08:10:38.136+0800 E REPL     [rsSync] 9001 socket exception [FAILED_STATE] server [hostname:port(IP) failed]
      2017-08-23T08:10:38.136+0800 E REPL     [rsSync] initial sync attempt failed, 8 attempts remaining
      

            Assignee:
            ramon.fernandez@mongodb.com Ramon Fernandez Marina
            Reporter:
            wekurtz WenniZ
            Votes:
            0 Vote for this issue
            Watchers:
            11 Start watching this issue

              Created:
              Updated:
              Resolved: