Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-38929

Refresh of RoutingTableHistory causes large rise in memory usage

    • Type: Icon: Bug Bug
    • Resolution: Duplicate
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: 4.0.1
    • Component/s: Sharding
    • None
    • Sharding
    • ALL

      In a sharded cluster, a long running query can cause a shard to refresh the routing table history multiple times. If the sharded cluster is very large, this routing table history can take up a large amount of space and eventually lead to an OOM.

      Here is a snapshot of call stacks that show 6.5 GB being used solely to update the routing table history.

      Here is the balancer information:

      balancer:
              Currently enabled:  yes
              Currently running:  yes
              Collections with active migrations: 
                      buildlogs.logs started at Tue Jan 08 2019 21:57:37 GMT+0000 (UTC)
              Failed balancer rounds in last 5 attempts:  0
              Migration Results for the last 24 hours: 
                      2990 : Success
                      1 : Failed with error 'aborted', from logkeeperdb-shard_26 to logkeeperdb-shard_24
                      1 : Failed with error 'aborted', from logkeeperdb-shard_26 to logkeeperdb-shard_21
                      2 : Failed with error 'aborted', from logkeeperdb-shard_14 to logkeeperdb-rs0
                      1 : Failed with error 'aborted', from logkeeperdb-shard_9 to logkeeperdb-shard_18
                      1 : Failed with error 'aborted', from logkeeperdb-shard_15 to logkeeperdb-shard_21
                      1 : Failed with error 'aborted', from logkeeperdb-shard_15 to logkeeperdb-shard_17
                      1 : Failed with error 'aborted', from logkeeperdb-shard_15 to logkeeperdb-shard_22
                      1 : Failed with error 'aborted', from logkeeperdb-shard_17 to logkeeperdb-shard_12
                      1 : Failed with error 'aborted', from logkeeperdb-shard_14 to logkeeperdb-shard_4
                      1 : Failed with error 'aborted', from logkeeperdb-shard_22 to logkeeperdb-shard_13
                      1 : Failed with error 'aborted', from logkeeperdb-shard_13 to logkeeperdb-shard_8
                      1 : Failed with error 'aborted', from logkeeperdb-shard_17 to logkeeperdb-shard_20
      
      mongos> db.chunks.find({ns: "buildlogs.logs"}).count()
      1303476
      

            Assignee:
            backlog-server-sharding [DO NOT USE] Backlog - Sharding Team
            Reporter:
            daniel.hatcher@mongodb.com Danny Hatcher (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            12 Start watching this issue

              Created:
              Updated:
              Resolved: