-
Type: Bug
-
Resolution: Done
-
Priority: Major - P3
-
None
-
Affects Version/s: 4.2.17
-
Component/s: None
-
Environment:Ubuntu 16.04
XSF
Kernel - 4.4.0-1128-aws #142-Ubuntu SMP Fri Apr 16 12:42:33 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
Disable Transparent Huge disabled
AWS m5.large (2cpu\8gb)
SSD GP3 450 Gb
monogo-org-server - 4.2.17
-
ALL
We're using `sh.addTagRange` to set custom chunk ranges. And usually the ranges are fixed and the chunks are not moved between the shards. However, the other day we needed to change the ranges which made the balancer move chunks. Within 15-20 minutes afterwards the primary of some of the shards got unresponsive and the whole shard cluster hang.
diagnostic.data.zip of the hang primary is attached.
Also, I found this message in the logs
STORAGE [FlowControlRefresher] Flow control is engaged and the sustainer point is not moving. Please check the health of all secondaries.
which led me to this issue. Might be related.