-
Type: Bug
-
Resolution: Done
-
Priority: Major - P3
-
None
-
Affects Version/s: 4.2.17
-
Component/s: Replication
-
None
-
Environment:Ubuntu 16.04
XSF
Kernel - 4.4.0-1128-aws #142-Ubuntu SMP Fri Apr 16 12:42:33 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
Disable Transparent Huge disabled
AWS m5.xlarge (4cpu\16gb)
SSD GP3 450 Gb
monogo-org-server - 4.2.17
-
Server Triage
-
ALL
-
At some point in time on one of our shards the entries of the collection admin.$cmd got bigger, because of which the oplog size began lowering. We didn't notice that the amount of entries had any increase, only the size.
I guess that somehow was related to using transaction since the entries were for the collections we're using transaction for.
That kept happening until we changed the primary replica on that shard. Right after that the oplog went back to normal.
Our cluster configuration:
- shard cluster with 10 shards
- four replicas in each shard
- about 400 GB of data in storage size per shard
Replica server configuration:
- Ubuntu 16.04
- XSF
- Kernel - 4.4.0-1128-aws #142-Ubuntu SMP Fri Apr 16 12:42:33 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
- Disable Transparent Huge disabled
- AWS m5.xlarge (4cpu\16gb)
- SSD GP3 450 Gb
- monogo-org-server - 4.2.17
I'm attaching diagnostic.data from the primary where the incident happened.
Incident time:
beginning - 03.06.2022 08:30:00 UTC
end - 03.06.2022 15:45:00 UTC