-
Type: Improvement
-
Resolution: Fixed
-
Priority: Major - P3
-
Affects Version/s: None
-
Component/s: Sharding
-
Fully Compatible
-
Sharding EMEA 2022-10-31, Sharding EMEA 2022-11-14
-
3
Motivation: Performance Improvement.
Description: Two different designs:
- Create a new input parameter 'dataDistribution' on $collStats to retrieve all necessary data (count, avgObjSize and numOrphanDocuments) to run the $shardedDataDistribution and other uses. It will not retrieve unnecessary data.
- The idea is to analyze the entire pipeline and see (via existing analysis) that it only references a handful of paths, and if anything else will not impact the results, it can be optimized. For example, as long as the request with $collStats or $allCollectionStats also includes the $project, we could filter the information we need. Inspect whether the $collstats stage is followed by a $project, and from there determine what output the $collstats should produce.
Actual performance without optimizing $collStats (sharded collection - seconds):
[js_test:all_collection_stats] performance test $shardedDataDistribution: 1000 00:00:05 [js_test:all_collection_stats] performance test $shardedDataDistribution: 2000 00:00:10 [js_test:all_collection_stats] performance test $shardedDataDistribution: 3000 00:00:16 [js_test:all_collection_stats] performance test $shardedDataDistribution: 4000 00:00:23 [js_test:all_collection_stats] performance test $shardedDataDistribution: 5000 00:00:33
Performance test used:
(function() { 'use strict'; const numberOfCollections = 5000; // Configure initial sharding cluster const st = new ShardingTest({shards: 3}); const mongos = st.s; const dbName = "test"; const db = mongos.getDB(dbName); let iterator = 1000; let total = 0; while (total < numberOfCollections) { // Insert data to validate the aggregation stage for (let i = 0; i < iterator; i++) { const coll = "coll" + total; // assert.commandWorked(db.createCollection(coll)); assert(st.adminCommand({shardcollection: dbName + "." + coll, key: {skey: 1}})); total++; } let it = 0; const start = new Date(); const cursor = mongos.getDB("admin").aggregate([{$shardedDataDistribution: {}}]); while (cursor.hasNext()) { const data = cursor.next(); it++; } const end = new Date(); const time = new Date(end - start).toISOString().slice(11, 19); print(`performance test $shardedDataDistribution: ` + total + ` ` + time); assert.eq(it, total + 1); } st.stop(); })();
- depends on
-
SERVER-70859 Optimize collStats to not retrieve all data
- Closed
- is depended on by
-
SERVER-70859 Optimize collStats to not retrieve all data
- Closed