-
Type: Improvement
-
Resolution: Fixed
-
Priority: Major - P3
-
Affects Version/s: None
-
Component/s: Build, Replication, Testing Infrastructure
-
None
-
Fully Compatible
-
Sharding 2020-10-19
I noticed this when running the full unittest suite locally, but it should also benefit the unittests in the commit queue and patch builds. The db_repl_test binary, and to a lesser extent, util_test, take much longer to run than the rest. When running a debug build locally on 40 cores, I still normally spend several minutes at the end waiting for just db_repl_test.
Using timings from a recent commit queue patch shows:
secs | binary |
---|---|
108.61 | db_repl_test |
48.16 | util_test |
33.11 | db_catalog_test |
31.68 | db_unittests |
31.20 | db_storage_test |
20.94 | storage_ephemeral_for_test_test |
19.99 | db_repl_cloners_test |
... | ... |
Looking into the suites shows just a few that run much longer than the others:
millis | binary and suite | num tests |
---|---|---|
29368 | db_repl_test RandomizedIdempotencyTest | 2 |
19508 | db_repl_test TenantOplogApplierTest | 19 |
15077 | db_repl_test OplogFetcherTest | 73 |
11347 | db_repl_test RSRollbackTest | 62 |
8746 | util_test Future | 71 |
8277 | util_test Future_Void | 61 |
7635 | util_test Future_MoveOnly | 60 |
5012 | util_test FailPointStress | 1 |
4648 | db_repl_test IdempotencyTestTxns | 20 |
4600 | db_repl_test IdempotencyTest | 21 |
4254 | db_repl_test InitialSyncerTest | 87 |
3125 | util_test InvariantTerminationTest | 13 |
2880 | util_test Future_EdgeCases | 10 |
2418 | util_test RegistryList | 2 |
2256 | util_test SharedFuture | 16 |
2159 | db_repl_test PrimaryOnlyServiceTest | 13 |
2072 | db_repl_test OplogBufferCollectionTest | 41 |
2043 | db_repl_test ReplicationRecoveryTest | 57 |
1577 | db_repl_test RollbackImplTest | 42 |
1406 | db_repl_test OplogApplierImplTest | 35 |
1011 | db_repl_test TenantOplogBatcherTest | 11 |
1000 | util_test BackgroundJobBasic | 3 |
Looking more closely at RandomizedIdempotencyTest shows that CheckUpdateSequencesAreIdempotent takes ~3 secs, and CheckUpdateSequencesAreIdempotentV2 takes the remaining ~26 secs:
2020-09-16T12:52:35.435Z I TEST 23063 [main] "Running","attr":{"suite":"RandomizedIdempotencyTest"} 2020-09-16T12:52:35.435Z I TEST 23059 [main] "Running","attr":{"test":"CheckUpdateSequencesAreIdempotent","rep":1,"reps":1} ... 2020-09-16T12:52:38.925Z I TEST 23059 [main] "Running","attr":{"test":"CheckUpdateSequencesAreIdempotentV2","rep":1,"reps":1} ... 2020-09-16T12:53:04.803Z I TEST 23060 [main] "Done running tests"
So it might be good to split at least this test (CheckUpdateSequencesAreIdempotentV2) or suite (RandomizedIdempotencyTest), and maybe some of the others, into separate binaries, so that they can better parallelize onto multiple cores.
(Sending this to SDP, since the ticket is primarily about improving overall (parallel) unittest runtime by avoiding individual long-running tests/suites/binaries. But it could of course be redirected to Replication to fix these particular ones.)