-
Type: Bug
-
Resolution: Works as Designed
-
Priority: Major - P3
-
None
-
Affects Version/s: 4.0.13
-
Component/s: Stability
-
None
-
ALL
We are running a replicaSet consisting of 3 bare metal servers. The primary went down while mongodump was executing (daily backup).
The mongodump command was as follows whereas $HOSTNAME stands for the FQDN of the Primary:
mongodump -h $HOSTNAME --port $PORT -u${USER} -p${PASS} --authenticationDatabase=$AUTHDATABASE -o $BACKUPDIR
We have saved mongod logs and diagnostic data from all 3 replSet members for your investigation.
Where can we upload them?
Here are the last lines of the mongod log, which seems not very helpful - at least for us:
2020-07-22T18:53:55.193+0200 I COMMAND [conn56968] command ipc-catalog.productOfferInfo command: find { find: "productOfferInfo", filter: { _i d: { _id: 3127816, locale: "de_DE" } }, limit: 1, singleBatch: true, $db: "ipc-catalog", $clusterTime: { clusterTime: Timestamp(1595436345, 1), signature: { hash: BinData(0, C48B0FBAAB6D9151CD788358E80382121C11A9E5), keyId: 6849703745916239873 } }, lsid: { id: UUID("629864ef-f24d-4147- 89dd-a6c24254de42") }, $readPreference: { mode: "nearest" } } planSummary: IDHACK numYields:0 ok:0 errMsg:"Executor error during find command : : caused by :: operation was interrupted" errName:InterruptedDueToReplStateChange errCode:11602 reslen:308 locks:{ Global: { acquireCount: { r: 1 } }, Database: { acquireCount: { r: 1 } }, Collection: { acquireCount: { r: 1 } } } protocol:op_msg 460023ms 2020-07-22T18:53:55.193+0200 I REPL [replexec-1968] Member mongo-india01-02.db00.pro06.eu.idealo.com:27017 is now in state RS_DOWN 2020-07-22T22:06:29.715+0200 I CONTROL [main] ***** SERVER RESTARTED *****
Both mongod and mongodump are of version 4.0.13.
Each server is running on Linux/Stretch and has 56 CPUs, 384 GB RAM, 5.8 TB SSD in Raid 10.