Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-4118

mongos causes dos by opening a ton of connections

    • Type: Icon: Bug Bug
    • Resolution: Done
    • Priority: Icon: Major - P3 Major - P3
    • 2.0.2
    • Affects Version/s: 2.0.0-rc2
    • Component/s: Performance, Sharding, Stability
    • Environment:
      ubuntu 10.04 x86_64; 4 shards, each shard a RS of two members + arbiter
    • Linux

      Comment for future reference - this ticket was used primarily to track a writeback listener change in 2.0.1 - the wbl was not forcing a version reload which would reset the connection version on the mongod side, even after multiple failed retries. The changes in this ticket fix that.

      this issue may be similar to or related to SERVER-4037. previously, we have noted that mongos will enter a state in which it is attempting to retry a command query indefinitely to some shard members. prior to 2.0.0-rc2, this resulted in mongos outputting a bunch of 'writeback failed' messages. this no longer occurs, but we still see behavior which cause the number of command ops/sec on shards to increase to 1500-2000 ops/sec range. note, these command ops are not logged at the shards but we definitely see them when running mongostat. bouncing mongos fixes the issue.

      on the shards, we can see that mongos effectively has caused a dos by issuing many quick successive connections:

      Fri Oct 21 11:50:49 [initandlisten] connection accepted from mongos:35539 #9663
      Fri Oct 21 11:50:50 [initandlisten] connection accepted from mongos:35543 #9664
      Fri Oct 21 11:50:51 [initandlisten] connection accepted from mongos:35544 #9665
      Fri Oct 21 11:50:51 [initandlisten] connection accepted from mongos:35545 #9666
      Fri Oct 21 11:50:53 [initandlisten] connection accepted from mongos:35546 #9667
      Fri Oct 21 11:51:01 [initandlisten] connection accepted from mongos:35547 #9668
      Fri Oct 21 11:51:02 [initandlisten] connection accepted from mongos:35551 #9669
      Fri Oct 21 11:51:02 [initandlisten] connection accepted from mongos:35552 #9670
      Fri Oct 21 11:51:03 [initandlisten] connection accepted from mongos:35553 #9671
      Fri Oct 21 11:51:03 [initandlisten] connection accepted from mongos:35554 #9672
      Fri Oct 21 11:51:05 [initandlisten] connection accepted from mongos:35558 #9673
      Fri Oct 21 11:51:12 [initandlisten] connection accepted from mongos:35559 #9674
      Fri Oct 21 11:51:13 [initandlisten] connection accepted from mongos:35560 #9675
      Fri Oct 21 11:51:13 [initandlisten] connection accepted from mongos:35561 #9676
      Fri Oct 21 11:51:14 [initandlisten] connection accepted from mongos:35565 #9677
      Fri Oct 21 11:51:14 [initandlisten] connection accepted from mongos:35569 #9678
      Fri Oct 21 11:51:16 [initandlisten] connection accepted from mongos:35570 #9679
      Fri Oct 21 11:51:18 [initandlisten] connection accepted from mongos:35571 #9680
      Fri Oct 21 11:51:18 [initandlisten] connection accepted from mongos:35572 #9681
      Fri Oct 21 11:51:18 [initandlisten] connection accepted from mongos:35573 #9682
      Fri Oct 21 11:51:19 [initandlisten] connection accepted from mongos:35577 #9683
      Fri Oct 21 11:51:19 [initandlisten] connection accepted from mongos:35580 #9684
      Fri Oct 21 11:51:19 [initandlisten] connection accepted from mongos:35581 #9685
      Fri Oct 21 11:51:21 [initandlisten] connection accepted from mongos:35585 #9686
      Fri Oct 21 11:51:22 [initandlisten] connection accepted from mongos:35586 #9687
      Fri Oct 21 11:51:22 [initandlisten] connection accepted from mongos:35587 #9688
      Fri Oct 21 11:51:29 [initandlisten] connection accepted from mongos:35591 #9689
      Fri Oct 21 11:51:30 [initandlisten] connection accepted from mongos:35592 #9690
      Fri Oct 21 11:51:31 [initandlisten] connection accepted from mongos:35593 #9691
      Fri Oct 21 11:51:32 [initandlisten] connection accepted from mongos:35594 #9692
      Fri Oct 21 11:51:34 [initandlisten] connection accepted from mongos:35595 #9693
      Fri Oct 21 11:51:37 [initandlisten] connection accepted from mongos:35596 #9694
      Fri Oct 21 11:51:38 [initandlisten] connection accepted from mongos:35597 #9695
      Fri Oct 21 11:51:38 [initandlisten] connection accepted from mongos:35598 #9696
      Fri Oct 21 11:51:40 [initandlisten] connection accepted from mongos:35599 #9697
      Fri Oct 21 11:51:40 [initandlisten] connection accepted from mongos:35600 #9698
      Fri Oct 21 11:51:41 [initandlisten] connection accepted from mongos:35601 #9699
      Fri Oct 21 11:51:42 [initandlisten] connection accepted from mongos:35602 #9700
      Fri Oct 21 11:51:42 [initandlisten] connection accepted from mongos:35603 #9701
      Fri Oct 21 11:51:42 [initandlisten] connection accepted from mongos:35604 #9702
      Fri Oct 21 11:51:43 [initandlisten] connection accepted from mongos:35605 #9703
      Fri Oct 21 11:51:44 [initandlisten] connection accepted from mongos:35609 #9704
      ...

      the load on this shard RS member increased to 15 and became unresponsive. bouncing mongos stopped the connections/queries and service returned to normal. during the initial stages of the problem, performance degraded severely eventually leading to all queries timing out.

            Assignee:
            greg_10gen Greg Studer
            Reporter:
            wayne530 Y. Wayne Huang
            Votes:
            2 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated:
              Resolved: