-
Type: Task
-
Resolution: Won't Fix
-
Priority: Major - P3
-
None
-
Affects Version/s: None
-
Component/s: None
-
None
-
Cluster Scalability
There is a genny workload CreateBigIndex that, for its third phase which is an InsertData phase, is bulk inserting 10 million documents in batches of 1000. This is being done on one thread. On a sharded environment (specifically the shard-lite build, which has one mongos and two mongod shards), this operation is timing out. We are sharding on _id, hashed. The mongod nodes each have an average latency of ~2 milliseconds per write, and are inserting 500 documents per second - the mongos has an average latency of about 1 second. This indicates that these documents might be currently be processed serially - we should investigate whether this is the case, and whether we can change the workload to run with unordered batches. This was discovered as a result of BF-31192.
The temporary fix for the attached BF was to prevent the workload from running in a sharded enviornment.
It is also worth noting that right now, when the insertData phase is allowed to finish (which can be accomplished by adding a LoggingActor in phase 2 that periodically logs to prevent the timeout, in which case the InsertData phase will finish in about two hours), setting the server parameter maxIndexBuildMemoryUsageMegabytes to 100 on the mongos in sharded builds will cause the workload to fail with an 'unrecognized parameter' error. Any fix for this workload should also take care to address this issue - one solution could be to only run the server command to lower the memory usage threshold when the task is not run in a sharded environment.