-
Type: Bug
-
Resolution: Done
-
Priority: Major - P3
-
None
-
Affects Version/s: 5.0.15
-
Component/s: None
-
ALL
-
In 4.x, out shard replicas consistently took around 2 minutes to start up. In 5.0, they are taking anywhere from 3.5 minutes to 12 minutes to start up. The additional time to start is primarily consumed by a WT checkpoint. It appears that 5.0 performs a checkpoint as part of the startup process, whereas 4.x waited until after startup was complete to takes the first checkpoint.
Here's an abridged example of a 4.2 startup:
2022-11-22T02:45:20.979+0000 I CONTROL [main] ***** SERVER RESTARTED ***** 2022-11-22T02:47:17.352+0000 I REPL [rsBackgroundSync] Changed sync source from empty to shard1repl3.example.com:27017 // TOTAL: 1m57s //... 2022-11-22T02:49:07.362+0000 I STORAGE [WTCheckpointThread] WiredTiger message [1669085347:362181][77440:0x7f956c557700], file:enelglobalservicessrl3/collection-11873-6569341069224170235.wt, WT_SESSION.checkpoint: Checkpoint has been running for 20 seconds and wrote: 65000 pages (3797 MB) 2022-11-22T02:49:11.100+0000 I STORAGE [WTCheckpointThread] WiredTiger message [1669085351:100075][77440:0x7f956c557700], WT_SESSION.checkpoint: Checkpoint ran for 23 seconds and wrote: 83927 pages (5232 MB) // CHECKPOINT: 23s
Notice, the startup takes less than 2 minutes, and the first checkpoint runs a couple minutes later.
And here's an abridged example of a 5.0 startup on the exact same machine:
{"t":{"$date":"2023-03-20T13:33:45.171+00:00"},"s":"I", "c":"CONTROL", "id":20698, "ctx":"-","msg":"***** SERVER RESTARTED *****"} // ... {"t":{"$date":"2023-03-20T13:34:53.896+00:00"},"s":"I", "c":"STORAGE", "id":22430, "ctx":"initandlisten","msg":"WiredTiger message","attr":{"message":"[1679319293:896857][116638:0x7f46be029b80], WT_SESSION.checkpoint: [WT_VERB_CHECKPOINT_PROGRESS] saving checkpoint snapshot min: 1, snapshot max: 1 snapshot count: 0, oldest timestamp: (1679318801, 3202) , meta checkpoint timestamp: (1679319101, 3202) base write gen: 1070042338"}} {"t":{"$date":"2023-03-20T13:41:37.518+00:00"},"s":"I", "c":"STORAGE", "id":22430, "ctx":"Checkpointer","msg":"WiredTiger message","attr":{"message":"[1679319697:518964][116638:0x7f46adbe4700], WT_SESSION.checkpoint: [WT_VERB_CHECKPOINT_PROGRESS] saving checkpoint snapshot min: 10, snapshot max: 10 snapshot count: 0, oldest timestamp: (1679318801, 3202) , meta checkpoint timestamp: (1679319101, 3202) base write gen: 1070042338"}} // CHECKPOINT: 7m12s // ... {"t":{"$date":"2023-03-20T13:42:04.502+00:00"},"s":"I", "c":"REPL", "id":21088, "ctx":"BackgroundSync","msg":"Changed sync source","attr":{"oldSyncSource":"empty","newSyncSource":"shard1repl3.example.com:27017"}} // STARTUP: 8m19s
Notice, the startup takes over 8 minutes, and over 7 minutes of that is spent doing a checkpoint.
8-12 minute startup time results in 10-14 minute restart time, which is unacceptable. It makes rolling restarts for upgrades and other maintenance take 5-7 times longer. And even more importantly, it prevents us from expediently dealing with cluster availability and performance issues.
Is it viable to switch back to 4.x behavior of waiting until startup is complete to perform the first wT checkpoint?