Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-16277

Removing Replica Set Member + Failover = Can't write to replica set with w:2

    • Type: Icon: Bug Bug
    • Resolution: Done
    • Priority: Icon: Major - P3 Major - P3
    • 2.6.6
    • Affects Version/s: 2.6.3, 2.6.5
    • Component/s: Replication
    • None
    • Fully Compatible
    • ALL
    • Hide

      1. Initialize replica set like this:

      {
              "_id" : "test",
              "version" : 1,
              "members" : [
                      {
                              "_id" : 0,
                              "host" : "myServer:27001",
                              "priority" : 50
                      },
                      {
                              "_id" : 1,
                              "host" : "myServer:27002",
                              "priority" : 50
                      },
                      {
                              "_id" : 2,
                              "host" : "myServer:27003",
                              "priority" : 0
                      },
                      {
                              "_id" : 3,
                              "host" : "myServer:27004",
                              "priority" : 10
                      },
              ]
      }
      

      2. Add a row, using w:2 (works)

      db.test.insert({x:1}, { writeConcern : { w:2, wtimeout: 15000 }})
      

      3. Reconfigure replica set with this configuration:

      {
              "_id" : "test",
              "version" : 2,
              "members" : [
                      {
                              "_id" : 0,
                              "host" : "myServer:27001",
                              "priority" : 50
                      },
                      {
                              "_id" : 1,
                              "host" : "myServer:27002",
                              "priority" : 50
                      },
                      {
                              "_id" : 2,
                              "host" : "myServer:27003",
                              "priority" : 0
                      }
              ]
      }
      

      4. Assuming the same server is primary before and after the reconfig, this will work:

      db.test.insert({x:2}, { writeConcern : { w:2, wtimeout: 15000 }})
      

      5. Failover to the server on 27002 (rs.stepDown())

      6. This operation times out:

      db.test.insert({x:3}, { writeConcern : { w:2, wtimeout: 15000 }})
      

      7. Fail back to 27001

      8. This works again:

      db.test.insert({x:2}, { writeConcern : { w:2, wtimeout: 15000 }})
      
      Show
      1. Initialize replica set like this: { "_id" : "test", "version" : 1, "members" : [ { "_id" : 0, "host" : "myServer:27001", "priority" : 50 }, { "_id" : 1, "host" : "myServer:27002", "priority" : 50 }, { "_id" : 2, "host" : "myServer:27003", "priority" : 0 }, { "_id" : 3, "host" : "myServer:27004", "priority" : 10 }, ] } 2. Add a row, using w:2 (works) db.test.insert({x:1}, { writeConcern : { w:2, wtimeout: 15000 }}) 3. Reconfigure replica set with this configuration: { "_id" : "test", "version" : 2, "members" : [ { "_id" : 0, "host" : "myServer:27001", "priority" : 50 }, { "_id" : 1, "host" : "myServer:27002", "priority" : 50 }, { "_id" : 2, "host" : "myServer:27003", "priority" : 0 } ] } 4. Assuming the same server is primary before and after the reconfig, this will work: db.test.insert({x:2}, { writeConcern : { w:2, wtimeout: 15000 }}) 5. Failover to the server on 27002 (rs.stepDown()) 6. This operation times out: db.test.insert({x:3}, { writeConcern : { w:2, wtimeout: 15000 }}) 7. Fail back to 27001 8. This works again: db.test.insert({x:2}, { writeConcern : { w:2, wtimeout: 15000 }})

      We recently reduced the number of nodes in our replica set from 4 (3 + 1 hidden) to 3. After removing the 4th node and changing the configuration of the other, the cluster comes back just fine. After failing over, the cluster won't take any writes with a write concern > 1. If you fail back to the original primary, the replica works fine. There is a workaround - you can simply restart all mongod processes after the reconfig, and everything works. We have been able to consistently reproduce this bug in versions 2.6.3 and 2.6.5. It does appear that the issue is not present in the rc0 version of 2.8.0.

            Assignee:
            ramon.fernandez@mongodb.com Ramon Fernandez Marina
            Reporter:
            fordjp@gmail.com Jason Ford
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              Resolved: