Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-18231

Primary is unable to be reached when secondary does fullSync

    • Type: Icon: Bug Bug
    • Resolution: Duplicate
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: 3.0.2
    • Component/s: Admin
    • None
    • ALL
    • Hide

      Shutdown the Secondary
      Remove all the files from dataPath
      start the replica set by this command for a fullSync:
      sudo mongod --storageEngine wiredTiger --dbpath /data/ --replSet rs0 --fork --logpath /var/log/mongodb/fork.log
      Wait for a day or more and the primary unable to be reached

      Show
      Shutdown the Secondary Remove all the files from dataPath start the replica set by this command for a fullSync: sudo mongod --storageEngine wiredTiger --dbpath /data/ --replSet rs0 --fork --logpath /var/log/mongodb/fork.log Wait for a day or more and the primary unable to be reached

      Hello,

      I have a replica set with three nodes (Primary, secondary and arbiter). MongoDB version is 3.0.2 and I started my replica set by this command:

      sudo mongod --storageEngine wiredTiger --dbpath /data/ --replSet rs0 --fork --logpath /var/log/mongodb/fork.log
      

      This is a db.stats():

      rs0:PRIMARY> db.stats()
      {
              "db" : "test",
              "collections" : 52,
              "objects" : 1697582895,
              "avgObjSize" : 745.3563943956916,
              "dataSize" : 1265304265805,
              "storageSize" : 647557865472,
              "numExtents" : 0,
              "indexes" : 176,
              "indexSize" : 22991790080,
              "ok" : 1
      }
      

      And this is how it looks like while I am syncing the Secondary on mongostat:

      root@mongodb-replica1:/data# mongostat --discover
      
                             insert query update delete getmore command % dirty % used flushes vsize   res qr|qw ar|aw netIn netOut conn set repl     time
             localhost:27017     56     6     13     *0       6     8|0     0.0   80.0       0 31.9G 31.3G   0|2   2|2   49k   214k   58 rs0  PRI 09:49:24
      mongodb-replica1:27017     56     6     13     *0       6     4|0     0.0   80.0       0 31.9G 31.3G   0|0   2|0   52k   213k   58 rs0  PRI 09:49:24
      mongodb-replica2:27017     *0    *0     *0     *0       0     1|0     1.2    1.3       0 31.9G 29.5G   0|1   1|0   79b    15k    5 rs0  UNK 09:49:24
      
             localhost:27017     26     6      8     *0       5     5|0     0.0   80.0       0 31.9G 31.3G   0|0   1|0   37k   634k   58 rs0  PRI 09:49:25
      mongodb-replica1:27017     27     6      8     *0       5     5|0     0.0   80.0       0 31.9G 31.3G   0|0   1|0   34k   633k   58 rs0  PRI 09:49:25
      mongodb-replica2:27017     *0    *0     *0     *0       0     7|0     1.2    1.4       0 31.9G 29.5G   0|1   1|0  596b   147k    5 rs0  UNK 09:49:25
      
             localhost:27017     35     3      8     *0       5    20|0     0.0   80.0       0 31.9G 31.3G   0|0   2|0   39k   146k   58 rs0  PRI 09:49:26
      mongodb-replica1:27017     34     3      8     *0       5    20|0     0.0   80.0       0 31.9G 31.3G   0|0   2|0   39k   146k   58 rs0  PRI 09:49:26
      mongodb-replica2:27017     *0    *0     *0     *0       0     2|0     1.3    1.4       0 31.9G 29.5G   0|1   1|0  137b    16k    5 rs0  UNK 09:49:26
      
             localhost:27017     22     6     24     *0       4     5|0     0.0   80.0       0 31.9G 31.3G   0|0   2|0   54k   212k   58 rs0  PRI 09:49:27
      mongodb-replica1:27017     22     6     24     *0       4     4|0     0.0   80.0       0 31.9G 31.3G   0|0   2|0   53k   197k   58 rs0  PRI 09:49:27
      mongodb-replica2:27017     *0    *0     *0     *0       0     4|0     1.3    1.5       0 31.9G 29.5G   0|1   1|0  422b    16k    5 rs0  UNK 09:49:27
      

      This is a second time I wanted to full sync my secondary and at the end when the storage is almost equal (650GB) and secondary is building indexes the Primary suddenly has a high cpu usage and eventually freezes. The SSH connection will drop and the machine is not accessible. By the look at alerts on both MMS and application level I can see that all the operations also blocked on Primary and there is no insert/update/and query.

      I didn't wait to see what would've happened when the secondary finishes its building index as it was at 22% and I had to wait for a long time without primary but when I restarted the primary the secondary suddenly removed everything and started from the beginning.

      The hardware spec is 10-core CPU with 80GB of memory and 3TB of storage on both Primary and Secondary. I don't have CPU profiling on MMS enabled as I remember I couldn't do it way back so let me know if you need more info or to log something for the next time.

            Assignee:
            ramon.fernandez@mongodb.com Ramon Fernandez Marina
            Reporter:
            maziyar Maziyar Panahi
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: