we are running an unauthenticated sharded setup with 4 shards, each a replicaset of 2 members and an arbiter. all nodes are running 2.0.3. periodically, mapreduce jobs fail with:
Array
(
[assertion] => DBClientBase::findN: transport error: mongo-s1-01:27011 query: { mapreduce.shardedfinish: { mapreduce: "PageView", map: CodeWScope( function()
, {}), reduce: CodeWScope(
function(k, vals)
, {}), query: { date:
{ $gt: 1330944522 }, content_type:
{ $in: [ "text/html; charset=utf-8", "text/html" ] }}, out: "mr.PageView.1331545722.6649.700620" }, shardedOutputCollection: "tmp.mrs.PageView_1331545723_8", shards: { shard2/mongo-s2-02:27022,mongo-s2-01:27021: { result: "tmp.mrs.PageView_1331545723_8", timeMillis: 7056, counts:
{ input: 146527, emit: 35517, reduce: 4562, output: 1391 }, ok: 1.0 }, shard3/mongo-s3-01:27017,mongo-s3-02:27017: { result: "tmp.mrs.PageView_1331545723_8", timeMillis: 2127, counts:
{ input: 37213, emit: 34834, reduce: 4460, output: 1351 }, ok: 1.0 }, shard4/mongo-s4-02:27017,mongo-s4-03:27017,mongo-s4-01:27017: { result: "tmp.mrs.PageView_1331545723_8", timeMillis: 3031, counts:
{ input: 58947, emit: 51909, reduce: 5721, output: 2096 }, ok: 1.0 } }, shardCounts: { shard2/mongo-s2-02:27022,mongo-s2-01:27021:
{ input: 146527, emit: 35517, reduce: 4562, output: 1391 }, shard3/mongo-s3-01:27017,mongo-s3-02:27017:
{ input: 37213, emit: 34834, reduce: 4460, output: 1351 }, shard4/mongo-s4-02:27017,mongo-s4-03:27017,mongo-s4-01:27017:
{ input: 58947, emit: 51909, reduce: 5721, output: 2096 }}, counts:
{ emit: 122260, input: 242687, output: 4838, reduce: 14743 } }
[assertionCode] => 10276
[errmsg] => db assertion failure
[ok] => 0
)
this happens periodically. running flushRouterConfig ahead of the MR job does not resolve this issue. bouncing mongos does not resolve the issue. in back to back runs, it fails about 4 out of every 5 tries with 1 success despite no changes on our end.
- is related to
-
SERVER-6752 Do not close all connections on replica set reconfig
- Closed