Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-45511

Data loss following machine PowerOff with writeConcernMajorityJournalDefault true

    • Type: Icon: Bug Bug
    • Resolution: Done
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: 4.2.0
    • Component/s: None
    • None
    • ALL

      Background:

      • I'm using mongodb 4.2.0 
      • I have deployed a mongo cluster which contain 5 configs, 3 querys and 3 shards. Each shard consist of 4 replicas and 1 arbiter.
      • All members are set on VMs.
      • ReplicaSets writeConcernMajorityJournalDefault flag is true.

      The Test:

      Before implementing the cluster on the production environment, I've conducted several "stress tests". I have created a simple script that performs many inserts to the cluster and returns the amount of successful inserts.

      When I run and stop the script everything is just fine. The number of inserts_count is identical to the count of documents in the collection.

      BUT, When I run the script and then PowerOff the Primary member, I'm facing a hitch. My script's insert_count is bigger (10-20) than the count of documents in my collection. I assume that I'm losing data.

      I got successful insert acknowledge even though my replicaSet is set writeConcernMajorityJournalDefault true.

      Raising the primary doesn't help to retrieve the lost data.

      I think the data was still in memory!

      Conclusion:

      I believe that there is some malfunction with the journaling setting.

      P.S:

      I tried to insert with _

      {w: majority, j: true, wtimeout: 5000}

      _ parameters.
      Same results

      Regards,
      Mark Berg

            Assignee:
            daniel.hatcher@mongodb.com Danny Hatcher (Inactive)
            Reporter:
            d52563@urhen.com Mark Berg
            Votes:
            0 Vote for this issue
            Watchers:
            10 Start watching this issue

              Created:
              Updated:
              Resolved: