-
Type: Bug
-
Resolution: Duplicate
-
Priority: Major - P3
-
None
-
Affects Version/s: None
-
Component/s: Performance
-
None
-
ALL
on MongoDB 3.4.19, shard cluster with 2 shards.
collection `foo` is sharded with hashed key "UID"
> sh.enableSharding("foo")
> db.adminCommand({"shardCollection": "foo.foo", key: {"UID": "hashed"}})
> use foo
> db.foo.createIndex({"name": 1})
we write a test case in mongo-cxx-driver to test the latency.
auto nStart = std::chrono::high_resolution_clock::now();
oDB[DB_TABLE_NAME].update_one(document{}<<"UID"<< nUID <<finalize, oValue.view(), opt);
auto nEnd = std::chrono::high_resolution_clock::now();
std::chrono::duration<double, std::milli> diff = nEnd - nStart;
printf("uid %d timediff %f\n", nUID, diff.count());
output as below:
uid 10030116 timediff 1.249126
uid 10000124 timediff 1.021864
uid 10050127 timediff 1.172118
uid 10020116 timediff 1.223791
uid 10040115 timediff 1.408828
uid 10070114 timediff 1.526046
What we did:
- The test case run with 8 threads, each thread runs 10000 upserts loop to the collection.
- Each upsert operation is on different UID.
- Except UID/ID, all docs are always the same.
What we found:
1. the latency has many spike, such as 4000+ ms in some update_one operation, but not any slow log in mongod server(we set slowms to 10ms).
2. as 1, the total timediff is unstable(FROM: ./testmongo | awk '{s+=$NF}END{print s}')
149726
129175
124993
219767
137422
156674
162410
119684
117086
116885
3. after we run `rs.stepDown()` on config Primary, the spike gone away(FROM: ./testmongo | awk '{s+=$NF}END{print s}')
106808
107225
107228
108055
106660
108690
105993
107037
106226
104789
106494
105178
107428
108650
106535
What we tried:
- initial the test db without sharding but create index UID as hashed, name as 1, the latency is normal and stable;
- run the benchmark to each replicaSet, teh latency is normal and stable;
- try to stepDown replicaSet, the latency may return normal.(not always)
- try to stepDown config again, the latecy may abnormal/normal again.(not always)
- replace the wiredTiger storage to MMAPv1, have the same sistuation.(can return normal after stepDown something)
- try to stop balance, it has not help;
- downgrade to MongoDB 3.2.22, the latency is normal and stable
- not any abnormal log, we only found split point lookup and network connection logs.
- duplicates
-
SERVER-40707 Secondary couldn't signal OplogWaiters to advance the lastCommittedOpTime in Chained replication mode
- Closed