We started to experience that our daily jobs which heavily use our MongoDB cluster took really long to finish. Looking at the output of mongostat on our replica sets we saw that the number of commands was much higher than the number of other operations.
During a typical run of the daily jobs the query count would be around 4000 and the command count below 100 while other operation counts vary between hundreds and a few thousands. At this time the query count was 300-400 and the command count 1200-1600.
Our mongodb cluster is authenticated and we use the same keyFile for all mongo instances.
I started to log network communication with mongosniff which reported high number of 'need to login' errors between the replica set members and mongos instances. This has happened before and we could always solve the issue by restarting the mongos instances. I tried to do the same now but unfortunately it didn't help. I also restarted all other pieces of the cluster too but that didn't solve the issue either.
The only difference between previous occasions and this one is that yesterday I upgraded all nodes from 2.0.2 to 2.0.4.
Here's a piece of the mongosniff output:
10.84.214.41:54537 -->> 10.84.214.41:27017 admin.$cmd 66 bytes id:83b 2107 query: { replSetGetStatus: 1 } ntoreturn: 1 ntoskip: 0 10.118.63.28:27017 <<-- 10.118.63.28:54537 79 bytes id:1d74c 120652 - 2107 reply n:1 cursorId: 0 { errmsg: "need to login", ok: 0.0 } 10.84.214.41:54537 -->> 10.84.214.41:27017 admin.$cmd 58 bytes id:83c 2108 query: { ismaster: 1 } ntoreturn: 1 ntoskip: 0 10.118.63.28:27017 <<-- 10.118.63.28:54537 324 bytes id:1d74d 120653 - 2108 reply n:1 cursorId: 0 { setName: "rs_a", ismaster: true, secondary: false, hosts: [ "mongodb-rsa1:27017", "mongodb-rsa2:27017" ], arbiters: [ "mongodb-c1:27018" ], primary: "mongodb-rsa1:27017", me: "mongodb-rsa1:27017", maxBsonObjectSize: 16777216, ok: 1.0 } 10.84.214.41:54537 -->> 10.84.214.41:27017 admin.$cmd 66 bytes id:83d 2109 query: { replSetGetStatus: 1 } ntoreturn: 1 ntoskip: 0 10.118.63.28:27017 <<-- 10.118.63.28:54537 79 bytes id:1d74e 120654 - 2109 reply n:1 cursorId: 0 { errmsg: "need to login", ok: 0.0 } 10.84.214.41:54537 -->> 10.84.214.41:27017 admin.$cmd 58 bytes id:83e 2110 query: { ismaster: 1 } ntoreturn: 1 ntoskip: 0 10.118.63.28:27017 <<-- 10.118.63.28:54537 324 bytes id:1d74f 120655 - 2110 reply n:1 cursorId: 0 { setName: "rs_a", ismaster: true, secondary: false, hosts: [ "mongodb-rsa1:27017", "mongodb-rsa2:27017" ], arbiters: [ "mongodb-c1:27018" ], primary: "mongodb-rsa1:27017", me: "mongodb-rsa1:27017", maxBsonObjectSize: 16777216, ok: 1.0 } 10.84.214.41:54537 -->> 10.84.214.41:27017 admin.$cmd 66 bytes id:83f 2111 query: { replSetGetStatus: 1 } ntoreturn: 1 ntoskip: 0 10.118.63.28:27017 <<-- 10.118.63.28:54537 79 bytes id:1d750 120656 - 2111 reply n:1 cursorId: 0 { errmsg: "need to login", ok: 0.0 } 10.244.73.3:44992 -->> 10.244.73.3:27017 admin.$cmd 58 bytes id:1e426 123942 query: { ismaster: 1 } ntoreturn: 1 ntoskip: 0 10.118.63.28:27017 <<-- 10.118.63.28:44992 324 bytes id:1d751 120657 - 123942 reply n:1 cursorId: 0 { setName: "rs_a", ismaster: true, secondary: false, hosts: [ "mongodb-rsa1:27017", "mongodb-rsa2:27017" ], arbiters: [ "mongodb-c1:27018" ], primary: "mongodb-rsa1:27017", me: "mongodb-rsa1:27017", maxBsonObjectSize: 16777216, ok: 1.0 } 10.244.73.3:44992 -->> 10.244.73.3:27017 admin.$cmd 66 bytes id:1e427 123943 query: { replSetGetStatus: 1 } ntoreturn: 1 ntoskip: 0 10.118.63.28:27017 <<-- 10.118.63.28:44992 79 bytes id:1d752 120658 - 123943 reply n:1 cursorId: 0 { errmsg: "need to login", ok: 0.0 } 10.244.73.3:44992 -->> 10.244.73.3:27017 admin.$cmd 58 bytes id:1e428 123944 query: { ismaster: 1 } ntoreturn: 1 ntoskip: 0 10.118.63.28:27017 <<-- 10.118.63.28:44992 324 bytes id:1d753 120659 - 123944 reply n:1 cursorId: 0 { setName: "rs_a", ismaster: true, secondary: false, hosts: [ "mongodb-rsa1:27017", "mongodb-rsa2:27017" ], arbiters: [ "mongodb-c1:27018" ], primary: "mongodb-rsa1:27017", me: "mongodb-rsa1:27017", maxBsonObjectSize: 16777216, ok: 1.0 } 10.244.73.3:44992 -->> 10.244.73.3:27017 admin.$cmd 66 bytes id:1e429 123945 query: { replSetGetStatus: 1 } ntoreturn: 1 ntoskip: 0 10.118.63.28:27017 <<-- 10.118.63.28:44992 79 bytes id:1d754 120660 - 123945 reply n:1 cursorId: 0 { errmsg: "need to login", ok: 0.0 }
- duplicates
-
SERVER-5405 mongos does not send reads to secondaries after replica restart when using keyFiles
- Closed