-
Type: Bug
-
Resolution: Done
-
Priority: Major - P3
-
None
-
Affects Version/s: 6.0.3, 6.0.9
-
Component/s: None
-
None
-
Environment:CentOS 7.9
-
ALL
A total of 12 servers are configured as a ReplicaSet to operate a Shard Cluster, with each server having three nodes grouped together.
Over the past month, approximately five secondary nodes have encountered issues with the message "potential hardware corruption, read checksum error: block header checksum doesn't match the expected checksum." Attempts to resolve the problem using the repair command have been unsuccessful, and the issue has persisted. Ultimately, the only effective solution was to delete the data and perform a resynchronization.
However, deleting the data and resyncing is not a practical solution due to the large data capacity of around 25TB. Determining the root cause of this issue has proven to be challenging.
How can I resolve this issue?