Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Done
Priority: Major - P3
Fix Version/s: None
Affects Version/s: 4.2.17
Component/s: None
Labels:
- balancing
Environment:
Ubuntu 16.04
XSF
Kernel - 4.4.0-1128-aws #142-Ubuntu SMP Fri Apr 16 12:42:33 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
Disable Transparent Huge disabled
AWS m5.large (2cpu\8gb)
SSD GP3 450 Gb
monogo-org-server - 4.2.17

Operating System:
ALL
Confidence Status:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name:
None
Goal Link:
None

We're using `sh.addTagRange` to set custom chunk ranges. And usually the ranges are fixed and the chunks are not moved between the shards. However, the other day we needed to change the ranges which made the balancer move chunks. Within 15-20 minutes afterwards the primary of some of the shards got unresponsive and the whole shard cluster hang.

diagnostic.data.zip of the hang primary is attached.

Also, I found this message in the logs

STORAGE [FlowControlRefresher] Flow control is engaged and the sustainer point is not moving. Please check the health of all secondaries.

which led me to this issue. Might be related.

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

Hide
diagnostic.data.zip
Jul 19 2022 07:24:01 AM UTC
48.37 MB
Vladimir Beliakov
Extracting archive...
Show
diagnostic.data.zip
Jul 19 2022 07:24:01 AM UTC
48.37 MB
Vladimir Beliakov
image-2022-09-09-05-44-22-482.png
Sep 09 2022 09:44:22 AM UTC
303 kB
Chris Kelly

Assignee:: Chris Kelly

Reporter:: Vladimir Beliakov

Participants:: Chris Kelly, Vladimir Beliakov

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Created:: Jul 19 2022 07:26:25 AM UTC

Updated:: Oct 11 2022 03:09:18 AM UTC

Resolved:: Oct 11 2022 03:09:18 AM UTC

Details

Description

Attachments

Attachments

Activity

People

Dates