-
Type: Bug
-
Resolution: Duplicate
-
Priority: Major - P3
-
None
-
Affects Version/s: 4.0.1
-
Component/s: Sharding
-
None
-
Sharding
-
ALL
In a sharded cluster, a long running query can cause a shard to refresh the routing table history multiple times. If the sharded cluster is very large, this routing table history can take up a large amount of space and eventually lead to an OOM.
Here is a snapshot of call stacks that show 6.5 GB being used solely to update the routing table history.
Here is the balancer information:
balancer: Currently enabled: yes Currently running: yes Collections with active migrations: buildlogs.logs started at Tue Jan 08 2019 21:57:37 GMT+0000 (UTC) Failed balancer rounds in last 5 attempts: 0 Migration Results for the last 24 hours: 2990 : Success 1 : Failed with error 'aborted', from logkeeperdb-shard_26 to logkeeperdb-shard_24 1 : Failed with error 'aborted', from logkeeperdb-shard_26 to logkeeperdb-shard_21 2 : Failed with error 'aborted', from logkeeperdb-shard_14 to logkeeperdb-rs0 1 : Failed with error 'aborted', from logkeeperdb-shard_9 to logkeeperdb-shard_18 1 : Failed with error 'aborted', from logkeeperdb-shard_15 to logkeeperdb-shard_21 1 : Failed with error 'aborted', from logkeeperdb-shard_15 to logkeeperdb-shard_17 1 : Failed with error 'aborted', from logkeeperdb-shard_15 to logkeeperdb-shard_22 1 : Failed with error 'aborted', from logkeeperdb-shard_17 to logkeeperdb-shard_12 1 : Failed with error 'aborted', from logkeeperdb-shard_14 to logkeeperdb-shard_4 1 : Failed with error 'aborted', from logkeeperdb-shard_22 to logkeeperdb-shard_13 1 : Failed with error 'aborted', from logkeeperdb-shard_13 to logkeeperdb-shard_8 1 : Failed with error 'aborted', from logkeeperdb-shard_17 to logkeeperdb-shard_20
mongos> db.chunks.find({ns: "buildlogs.logs"}).count() 1303476
- duplicates
-
SERVER-36443 Long-running queries should not cause a build-up of unused ChunkManager objects
- Closed