-
Type:
Improvement
-
Resolution: Fixed
-
Priority:
Major - P3
-
Affects Version/s: None
-
Component/s: Testing Infrastructure
-
None
-
Query Execution
-
Fully Compatible
-
QE 2025-02-03, QE 2025-02-17
Historically, strings of repeated characters in JavaScript have often been built using the expression
Array(length).join(char)
This expression creates a temporary array of the specified length using empty values, and joins the empty values together using the separator character.
Newer versions of JavaScript added a more tailored and more specific String.repeat function to build strings of repeated characters:
char.repeat(length)
The runtime of the two variants is vastly different when building large strings. For example, when running the following script to measure the runtimes, the difference is substantial:
lengths = [1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024, 2048, 4096, 8192, 16384]; measure = (label, cb) => { lengths.forEach((length) => { const start = Date.now(); const x = cb(length * 1024); const end = Date.now(); print("test case:", label, "length:", x.length, "took:", end - start, "ms"); }); }; measure("string repeat", (length) => "x".repeat(length)); measure("array join", (length) => Array(length + 1).join("x"));
mongo really struggles when building large strings with Array(length).join(char). char.repeat(length) however is pretty fast across the board, as can be seen in this runtime comparison:
test case: string repeat length: 1024 took: 0 ms test case: string repeat length: 2048 took: 0 ms test case: string repeat length: 4096 took: 0 ms test case: string repeat length: 8192 took: 0 ms test case: string repeat length: 16384 took: 0 ms test case: string repeat length: 32768 took: 0 ms test case: string repeat length: 65536 took: 0 ms test case: string repeat length: 131072 took: 0 ms test case: string repeat length: 262144 took: 0 ms test case: string repeat length: 524288 took: 0 ms test case: string repeat length: 1048576 took: 0 ms test case: string repeat length: 2097152 took: 0 ms test case: string repeat length: 4194304 took: 1 ms test case: string repeat length: 8388608 took: 0 ms test case: string repeat length: 16777216 took: 0 ms test case: array join length: 1024 took: 4 ms test case: array join length: 2048 took: 8 ms test case: array join length: 4096 took: 16 ms test case: array join length: 8192 took: 31 ms test case: array join length: 16384 took: 62 ms test case: array join length: 32768 took: 125 ms test case: array join length: 65536 took: 248 ms test case: array join length: 131072 took: 497 ms test case: array join length: 262144 took: 994 ms test case: array join length: 524288 took: 1992 ms test case: array join length: 1048576 took: 3995 ms test case: array join length: 2097152 took: 7974 ms test case: array join length: 4194304 took: 15938 ms test case: array join length: 8388608 took: 31841 ms test case: array join length: 16777216 took: 63605 ms
This also means that building a JavaScript string of 16MB length with Array(length).join(char) already takes 60+ seconds in mongo.
We currently have about 50 to 60 occurrences of Array(length).join(char) in our JavaScript tests in jstests, which we could drastically speed up by replacing the Array.join variants with String.repeat.
These occurrences can easily be found and adjusted.
We should do this to improve test turnaround times.
Note that the performance measurements shown here are from a fastbuild Bazel build, which has debug mode turned on. The very slow runtime of Array.join is probably due to some debug assertions.
The difference is much lower when using an optimized build.
In the CI, we use opt builds of the mongo shell for most variants since SERVER-90484, so the change won't help much for those.
However, we still have some debug build variants, and for those the change should help to reduce CI runtime.
It should also help to decrease local test runtimes for MongoDB developers in case they work with fastbuild build configurations, which has debug mode turned on and is the default.
Note that the two variants are not 100% equivalent, because Array(length).join(char) will build a string of length - 1 characters, whereas char.repeat(length) will build a string of length characters.
If this difference is taken into account, the functions should behave identically for valid inputs.