-
Type: Task
-
Resolution: Done
-
Priority: Major - P3
-
None
-
Affects Version/s: 5.1.0
-
Component/s: Sharding
-
Sharding 2021-12-13, Sharding 2021-12-27
As part of PM-2423 we have been measuring the performance of the migration protocol. We saw some weird numbers related to the cloning of the sessions that would be interesting to understand.
Environment
3-shard Sharded cluster running 5.1 binaries.
Experiment
We create a sharded collection with an initial pre-split of 1K chunks. The shard key is a hashed random number. After that we insert as bulk of 1K documents using retryable writes.
After that we execute a few thousand random migrations. There are not CRUD operations during the execution of this phase.
You can check the Genny worload we executed here.
Results
You can find our results here.
We are plotting two different variables:
- the total execution time spent holding the critical section during the migration (catch-up phase of the migration).
- the total execution time spent holding the critical section blocking reads and writes (commit phase of the migration).
The interesting time is the first one, the second one is more or less constant. We can see an slow down of 30x between the first move chunks and the lasts ones on my machine, after 4K moveChunks. We also got some numbers on EVG, you see them on the different tabs.
- is related to
-
SERVER-62233 Make SessionCatalogMigrationSource handleWriteHistory filter out oplogs outside of the chunkRange with OpType 'n'
- Closed