Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-21838

Replica set with mixed compression types slower than expected

    • Type: Icon: Bug Bug
    • Resolution: Done
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: 3.2.0
    • Component/s: WiredTiger
    • None
    • ALL
    • Hide

      Create a replica set with the primary using zlib and the secondary using snappy. Check the CPU usage and the performance characteristics vs zlib/zlib and snappy/snappy.

      zlib/snappy appears to be roughly 25% slower than zlib/zlib, which the CPU only running around 50% with zlib/snappy configured.

      [Nick J workload]

      Show
      Create a replica set with the primary using zlib and the secondary using snappy. Check the CPU usage and the performance characteristics vs zlib/zlib and snappy/snappy. zlib/snappy appears to be roughly 25% slower than zlib/zlib, which the CPU only running around 50% with zlib/snappy configured. [Nick J workload]

      I'm playing with replica sets on 3.2. I have the following topology:

      1 x i3770 with SSD [Primary]
      1 x intel NUC with SSD [secondary]
      1 x i5960 with SSD [arbiter]

      .NET application using the C# driver (2.2).

      Scenario 1 - No replica set:

      My application running on a separate box connects to the primary (with no replica set configured) and has a total throughput of X.

      Scenario 2 - 1 node replica set (on the primary):

      Same two boxes as above, but mongod on the primary is started with --replSet. Total throughput is X - 15%. This makes sense as there is extra CPU and disk IO required.

      Scenario 3 - 2 node replica with arbiter:

      This time I've configured a standard replica set. Throughput drops to X - 55%.

      I can see the NUC (which is very weak) is CPU-bound, and the other two boxes are barely breaking a sweat. My understanding was that replication was asynchronous and that the replication from primary to secondary would not/should not slow down writing to the primary (at least not by such a large amount). As best I can tell I don't have the write concern set to majority (unless that is the default for a cluster).

      I noticed that on my primary I was using zlib compression for both journal and collection (primary has a smaller SSD) and was using snappy for the replica.

      I tried using snappy on both and performance jumped up more than expected, and CPU on the primary popped up to 100%.

      I also tried zlib on both primary and secondary, and this showed better performance that mixed.

        1. AddNodePerfDrop.png
          AddNodePerfDrop.png
          45 kB
        2. ARB_metrics.2015-12-28T17-11-40Z-00000
          45 kB
        3. PRIMARY_metrics.2015-12-28T17-06-59Z-00000
          337 kB
        4. replica_status.png
          replica_status.png
          19 kB
        5. SECONDARY_metrics.2015-12-28T17-08-02Z-00000
          384 kB
        6. SINGLENODE_metrics.2015-12-28T17-36-12Z-00000
          146 kB
        7. snappy_singlenode.png
          snappy_singlenode.png
          32 kB
        8. snappy_snappy.png
          snappy_snappy.png
          35 kB
        9. zlib_snappy_cpu_primary.png
          zlib_snappy_cpu_primary.png
          61 kB
        10. zlib_snappy_secondary.metrics.2015-12-10T04-35-13Z-00000
          734 kB
        11. zlib_snappy.metrics.2015-12-10T04-37-19Z-00000
          473 kB
        12. zlib_snappy.png
          zlib_snappy.png
          48 kB
        13. zlib_zlib_cpu_primary.png
          zlib_zlib_cpu_primary.png
          39 kB
        14. zlib_zlib.metrics.2015-12-10T05-26-01Z-00000
          273 kB
        15. zlib_zlib.png
          zlib_zlib.png
          40 kB

            Assignee:
            Unassigned Unassigned
            Reporter:
            nick@innsenroute.com Nick Judson
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

              Created:
              Updated:
              Resolved: