-
Type: Bug
-
Resolution: Done
-
Priority: Major - P3
-
Affects Version/s: 1.7.0
-
Component/s: None
-
None
-
Environment:running the nightly
git version: d16ac9d54d9595710ad8288ccdd742d9242a6fc3
-
ALL
Problem:
Running a bulk insert via a Java program into a 3 shard system. After about 30 minutes I see the following errors in the log file for the router node
Thu Sep 9 19:09:15 [conn8] autosplitting scaleout.blogs size: 125479578 shard: ns:scaleout.blogs at: replset0:replset0/10.204.33.94:27000 lastmod: 3|25 min:
{ ts: -539057490 }max:
{ ts: -2960867 }on:
{ ts: -271305523 }(splitThreshold 104857600)
Thu Sep 9 19:09:15 [conn8] ERROR: splitIfShould failed: locking namespace failed
Thu Sep 9 19:09:25 [conn6] autosplitting scaleout.blogs size: 125322369 shard: ns:scaleout.blogs at: replset0:replset0/10.204.33.94:27000 lastmod: 3|23 min:
max:
{ ts: -1076104534 }on:
{ ts: -1343209153 }(splitThreshold 104857600)
Thu Sep 9 19:09:25 [conn6] ERROR: splitIfShould failed: locking namespace failed
Thu Sep 9 19:22:21 [conn2] end connection 71.139.0.44:55312
At the same time, I see my Java clients fail with
Exception in thread "Thread-1" com.mongodb.MongoException$Network: can't call something
at com.mongodb.DBTCPConnector.call(DBTCPConnector.java:194)
at com.mongodb.DBTCPConnector.call(DBTCPConnector.java:192)
at com.mongodb.DBTCPConnector.call(DBTCPConnector.java:192)
at com.mongodb.DBApiLayer$MyCollection.__find(DBApiLayer.java:223)
at com.mongodb.DBCollection.findOne(DBCollection.java:486)
at com.mongodb.DBCollection.findOne(DBCollection.java:475)
at com.mongodb.DB.command(DB.java:137)
at com.mongodb.DB.getLastError(DB.java:283)
at InsertSpeed$Runner.run(InsertSpeed.java:64)
Caused by: java.io.IOException: couldn't connect to [/10.204.69.250:27500] bc:java.net.ConnectException: Connection timed out
at com.mongodb.DBPort._open(DBPort.java:150)
at com.mongodb.DBPort.go(DBPort.java:70)
at com.mongodb.DBPort.call(DBPort.java:56)
at com.mongodb.DBTCPConnector.call(DBTCPConnector.java:186)
... 8 more
All mongod's are still running on all machines.
Reproduce:
Not clear, second run did not hit this problem.
Solution:
Need to understand why this error is occurring and what the user can do about it.
Business Case:
Reliability
- is related to
-
SERVER-1521 yield lock during removeRange
- Closed