Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Community Answered
Priority: Major - P3
Fix Version/s: None
Affects Version/s: None
Component/s: None
Labels:
None
Environment:
Ubuntu 16.04
XSF
Kernel - 4.4.0-1128-aws #142-Ubuntu SMP Fri Apr 16 12:42:33 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
Disable Transparent Huge disabled
AWS m5.large (2cpu\8gb)
SSD GP3 450 Gb
monogo-org-server - 4.2.17

Operating System:
ALL

For no apparent reason, our primary replica member of one of the shards got unresponsive until we restarted it.

The incident lasted for about 35 minutes. During that time we saw almost 100% consumption on the primary and its load average was up to 60 times the normal values.

From the logs (as of the beginning of the incident) we understood only this:
1) The amount of opened connections started increasing.
2) Some opened cursors got timed out.
3) The pooled connections to other members got dropped (due to shutdown, but we didn't try to shut down the primary at that time).
```
I CONNPOOL [TaskExecutorPool-0] Dropping all pooled connections to some-secondary:27017 due to ShutdownInProgress: Pool for some-secondary:27017 has expired.
```
4) After some time no log entries appear (for about 25 minutes) until we restarted the primary.

Our cluster configuration:

shard cluster with 10 shards
four replicas in each shard
about 400 GB of data in storage size per shard

Replica server configuration:

Ubuntu 16.04
XSF
Kernel - 4.4.0-1128-aws #142-Ubuntu SMP Fri Apr 16 12:42:33 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
Disable Transparent Huge disabled
AWS m5.large (2cpu\8gb)
SSD GP3 450 Gb
monogo-org-server - 4.2.17

`diagnostic.data` of the primary and one of the secondary attached to the post.

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

data.diagnostics.zip
14.66 MB
May 12 2022 02:58:36 PM UTC
diagnostic.data.zip
48.37 MB
Feb 25 2022 08:34:00 AM UTC
diagnostic.data-1.zip
14.10 MB
Jun 03 2022 07:15:19 AM UTC
diagnostic.data-2.zip
19.85 MB
Jul 13 2022 01:51:28 PM UTC
gdb_2022-04-12_14-15-43.txt
535 kB
Apr 12 2022 02:29:29 PM UTC
gdb_2022-05-12_09-59-24.zip
146 kB
May 12 2022 02:58:40 PM UTC
gdb.html
27 kB
Apr 14 2022 03:50:18 PM UTC
metrics.zip
31.69 MB
Apr 18 2022 08:31:19 AM UTC

Assignee:: Dmitry Agranat

Reporter:: Vladimir Beliakov

Participants:: Dmitry Agranat, Edwin Zhou, Vladimir Beliakov

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Created:: Feb 25 2022 08:39:07 AM UTC

Updated:: Oct 27 2023 03:56:25 PM UTC

Resolved:: Jun 23 2022 10:03:23 AM UTC

Details

Description

Attachments

Attachments

Activity

People

Dates