There are a lot of sharding test failures in the debug suites, which happen because writes of cluster configuration values (chunk size in particular) time out waiting on majority write. Example:
[js_test:localhostAuthBypass] 2016-08-17T22:45:04.950+0000 2016-08-17T22:45:04.950+0000 E QUERY [thread1] Error: write concern failed with errors: { [js_test:localhostAuthBypass] 2016-08-17T22:45:04.950+0000 "nMatched" : 0, [js_test:localhostAuthBypass] 2016-08-17T22:45:04.950+0000 "nUpserted" : 1, [js_test:localhostAuthBypass] 2016-08-17T22:45:04.950+0000 "nModified" : 0, [js_test:localhostAuthBypass] 2016-08-17T22:45:04.950+0000 "_id" : "chunksize", [js_test:localhostAuthBypass] 2016-08-17T22:45:04.950+0000 "writeConcernError" : { [js_test:localhostAuthBypass] 2016-08-17T22:45:04.951+0000 "code" : 64, [js_test:localhostAuthBypass] 2016-08-17T22:45:04.951+0000 "errInfo" : { [js_test:localhostAuthBypass] 2016-08-17T22:45:04.951+0000 "wtimeout" : true [js_test:localhostAuthBypass] 2016-08-17T22:45:04.951+0000 }, [js_test:localhostAuthBypass] 2016-08-17T22:45:04.951+0000 "errmsg" : "waiting for replication timed out" [js_test:localhostAuthBypass] 2016-08-17T22:45:04.951+0000 } [js_test:localhostAuthBypass] 2016-08-17T22:45:04.951+0000 } : [js_test:localhostAuthBypass] 2016-08-17T22:45:04.951+0000 _getErrorWithCode@src/mongo/shell/utils.js:25:13 [js_test:localhostAuthBypass] 2016-08-17T22:45:04.951+0000 doassert@src/mongo/shell/assert.js:13:14 [js_test:localhostAuthBypass] 2016-08-17T22:45:04.951+0000 assert.writeOK@src/mongo/shell/assert.js:422:9 [js_test:localhostAuthBypass] 2016-08-17T22:45:04.951+0000 setChunkSize@src/mongo/shell/shardingtest.js:1209:1 [js_test:localhostAuthBypass] 2016-08-17T22:45:04.951+0000 authutil.asCluster@src/mongo/shell/utils_auth.js:77:20 [js_test:localhostAuthBypass] 2016-08-17T22:45:04.951+0000 ShardingTest@src/mongo/shell/shardingtest.js:1216:13 [js_test:localhostAuthBypass] 2016-08-17T22:45:04.951+0000 start@jstests\sharding\localhostAuthBypass.js:182:1 [js_test:localhostAuthBypass] 2016-08-17T22:45:04.951+0000 @jstests\sharding\localhostAuthBypass.js:229:14 [js_test:localhostAuthBypass] 2016-08-17T22:45:04.951+0000 @jstests\sharding\localhostAuthBypass.js:5:2
These writes happen right after the primary has been established and at that time the replica set is still in churn. In order to mitigate the chance of failure we should increase these writes' timeout to be 60 seconds instead of 30.
- is related to
-
SERVER-25976 Wait for replication of config.version write and config server index builds during ShardingTest setup
- Closed