-
Type: Bug
-
Resolution: Duplicate
-
Priority: Major - P3
-
None
-
Affects Version/s: None
-
Component/s: None
-
None
-
ALL
-
(copied to CRM)
Since upgrading from 4.2 to 4.4, we are experiencing extremely long shutdown times for shard replica mongod processes. We are seeing this in our production clusters, as well as our much smaller test clusters. With 4.2, mongod shutdowns typically took less than a minute. Now they often take 30-60 minutes. When they do take such a long time, it is always because they are executing a WiredTiger rollback-to-stable. You can see logs like the following spewing out every 20 seconds:
{"t":{"$date":"2023-02-23T02:43:14.628+00:00"},"s":"I", "c":"STORAGE", "id":22322, "ctx":"conn1315","msg":"Shutting down checkpoint thread"} {"t":{"$date":"2023-02-23T02:43:14.628+00:00"},"s":"I", "c":"STORAGE", "id":22323, "ctx":"conn1315","msg":"Finished shutting down checkpoint thread"} {"t":{"$date":"2023-02-23T02:43:14.631+00:00"},"s":"I", "c":"STORAGE", "id":4795902, "ctx":"conn1315","msg":"Closing WiredTiger","attr":{"closeConfig":"leak_memory=true,"}} {"t":{"$date":"2023-02-23T02:43:20.914+00:00"},"s":"I", "c":"NETWORK", "id":22944, "ctx":"conn1292","msg":"Connection ended","attr":{"remote":"10.160.171.190:37458","connectionId":1292,"connectionCount":18}} {"t":{"$date":"2023-02-23T02:43:21.988+00:00"},"s":"I", "c":"NETWORK", "id":22944, "ctx":"conn262","msg":"Connection ended","attr":{"remote":"10.160.170.250:49662","connectionId":262,"connectionCount":17}} {"t":{"$date":"2023-02-23T02:43:34.634+00:00"},"s":"I", "c":"STORAGE", "id":22430, "ctx":"conn1315","msg":"WiredTiger message","attr":{"message":"[1677120214:634344][3628:0x7f70507e1700], file:cdpactivationtest/collection-0--6346230458143384831.wt, txn rollback_to_stable: [WT_VERB_RECOVERY_PROGRESS] Rollback to stable has been running for 20 seconds and has inspected 597611 files. For more detailed logging, enable WT_VERB_RTS"}} {"t":{"$date":"2023-02-23T02:43:47.381+00:00"},"s":"I", "c":"NETWORK", "id":22944, "ctx":"conn1290","msg":"Connection ended","attr":{"remote":"10.160.171.190:37008","connectionId":1290,"connectionCount":16}} {"t":{"$date":"2023-02-23T02:43:54.636+00:00"},"s":"I", "c":"STORAGE", "id":22430, "ctx":"conn1315","msg":"WiredTiger message","attr":{"message":"[1677120234:636642][3628:0x7f70507e1700], file:config/collection-112432-5941089686580730609.wt, txn rollback_to_stable: [WT_VERB_RECOVERY_PROGRESS] Rollback to stable has been running for 40 seconds and has inspected 601768 files. For more detailed logging, enable WT_VERB_RTS"}} {"t":{"$date":"2023-02-23T02:44:14.637+00:00"},"s":"I", "c":"STORAGE", "id":22430, "ctx":"conn1315","msg":"WiredTiger message","attr":{"message":"[1677120254:637117][3628:0x7f70507e1700], file:config/collection-13319-559579135213939646.wt, txn rollback_to_stable: [WT_VERB_RECOVERY_PROGRESS] Rollback to stable has been running for 60 seconds and has inspected 605383 files. For more detailed logging, enable WT_VERB_RTS"}} {"t":{"$date":"2023-02-23T02:44:34.638+00:00"},"s":"I", "c":"STORAGE", "id":22430, "ctx":"conn1315","msg":"WiredTiger message","attr":{"message":"[1677120274:638360][3628:0x7f70507e1700], file:config/collection-153892-5941089686580730609.wt, txn rollback_to_stable: [WT_VERB_RECOVERY_PROGRESS] Rollback to stable has been running for 80 seconds and has inspected 609305 files. For more detailed logging, enable WT_VERB_RTS"}} {"t":{"$date":"2023-02-23T02:44:54.632+00:00"},"s":"I", "c":"STORAGE", "id":22430, "ctx":"conn1315","msg":"WiredTiger message","attr":{"message":"[1677120294:632279][3628:0x7f70507e1700], file:config/collection-153892-5941089686580730609.wt, txn rollback_to_stable: [WT_VERB_RECOVERY_PROGRESS] Rollback to stable has been running for 100 seconds and has inspected 613075 files. For more detailed logging, enable WT_VERB_RTS"}} {"t":{"$date":"2023-02-23T02:45:14.638+00:00"},"s":"I", "c":"STORAGE", "id":22430, "ctx":"conn1315","msg":"WiredTiger message","attr":{"message":"[1677120314:638068][3628:0x7f70507e1700], file:config/collection-196706-5941089686580730609.wt, txn rollback_to_stable: [WT_VERB_RECOVERY_PROGRESS] Rollback to stable has been running for 120 seconds and has inspected 616721 files. For more detailed logging, enable WT_VERB_RTS"}} {"t":{"$date":"2023-02-23T02:45:34.632+00:00"},"s":"I", "c":"STORAGE", "id":22430, "ctx":"conn1315","msg":"WiredTiger message","attr":{"message":"[1677120334:632897][3628:0x7f70507e1700], file:config/collection-220381-5941089686580730609.wt, txn rollback_to_stable: [WT_VERB_RECOVERY_PROGRESS] Rollback to stable has been running for 140 seconds and has inspected 620387 files. For more detailed logging, enable WT_VERB_RTS"}} {"t":{"$date":"2023-02-23T02:45:54.635+00:00"},"s":"I", "c":"STORAGE", "id":22430, "ctx":"conn1315","msg":"WiredTiger message","attr":{"message":"[1677120354:635610][3628:0x7f70507e1700], file:config/collection-244326-5941089686580730609.wt, txn rollback_to_stable: [WT_VERB_RECOVERY_PROGRESS] Rollback to stable has been running for 160 seconds and has inspected 624076 files. For more detailed logging, enable WT_VERB_RTS"}} {"t":{"$date":"2023-02-23T02:46:14.632+00:00"},"s":"I", "c":"STORAGE", "id":22430, "ctx":"conn1315","msg":"WiredTiger message","attr":{"message":"[1677120374:632253][3628:0x7f70507e1700], file:config/collection-270505-5941089686580730609.wt, txn rollback_to_stable: [WT_VERB_RECOVERY_PROGRESS] Rollback to stable has been running for 180 seconds and has inspected 627528 files. For more detailed logging, enable WT_VERB_RTS"}} ...
I checked the system stats during the rollback-to-stable, and none of them were saturated. I also took a sampling of the system calls being made using strace:
% time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 0.00 0.000000 0 910 read 0.00 0.000000 0 503 write 0.00 0.000000 0 461 open 0.00 0.000000 0 591 close 0.00 0.000000 0 585 stat 0.00 0.000000 0 6 writev 0.00 0.000000 0 85452 select 0.00 0.000000 0 708 sched_yield 0.00 0.000000 0 446 sendmsg 0.00 0.000000 0 998 recvmsg 0.00 0.000000 0 260 getdents 0.00 0.000000 0 6 rename 0.00 0.000000 0 3627 gettimeofday 0.00 0.000000 0 65 getrusage 0.00 0.000000 0 585 statfs 0.00 0.000000 0 89303 37032 futex 0.00 0.000000 0 65 sched_getaffinity 0.00 0.000000 0 10 10 restart_syscall 0.00 0.000000 0 44993 clock_gettime 0.00 0.000000 0 130 openat ------ ----------- ----------- --------- --------- ---------------- 100.00 0.000000 229704 37042 total
I used db.adminCommand( { "setParameter": 1, "wiredTigerEngineRuntimeConfig": "verbose=[rts]"}) to enable verbose logging for WiredTiger RTS. This produced a ton of output, but most of it consisted of the following 3 log messages:
{"t":{"$date":"2023-02-16T21:46:07.465+00:00"},"s":"I", "c":"STORAGE", "id":22430, "ctx":"SignalHandler","msg":"WiredTiger message","attr":{"message":"[1676583967:465207][24280:0x7f09ea138700], file:test2nto/index-1020619--8977920817120252160.wt, txn rollback_to_stable: [WT_VERB_RECOVERY_RTS(session)] skip rollback to stable on file file:test2nto/index-4666-8107731100219616386.wt because its checkpoint address length is 0"}} {"t":{"$date":"2023-02-16T21:46:07.459+00:00"},"s":"I", "c":"STORAGE", "id":22430, "ctx":"SignalHandler","msg":"WiredTiger message","attr":{"message":"[1676583967:459380][24280:0x7f09ea138700], file:test2nto/index-1020619--8977920817120252160.wt, txn rollback_to_stable: [WT_VERB_RECOVERY_RTS(session)] file:test2nto/index-1021448--8977920817120252160.wt: tree skipped with durable timestamp: (0, 0) and stable timestamp: (1676583941, 44) or txnid: 0"}} {"t":{"$date":"2023-02-16T21:46:07.436+00:00"},"s":"I", "c":"STORAGE", "id":22430, "ctx":"SignalHandler","msg":"WiredTiger message","attr":{"message":"[1676583967:436685][24280:0x7f09ea138700], file:test2nto/index-1020156--8977920817120252160.wt, txn rollback_to_stable: [WT_VERB_RECOVERY_RTS(session)] 0x5583f59294e0: page walk skipped"}}
This snippet from https://jira.mongodb.org/browse/SERVER-52815 appears to explain why these rollback-to-stables are now happening on every shutdown in 4.4:
But it's not clear if there's any way to change the behavior. We would like to go back to how it behaved in 4.2 and earlier, where we were able to shutdown our mongod processes in a minute or so. An hour long shutdown time is unacceptable. It makes rolling restarts for upgrades and other maintenance take 50 times longer. And even more importantly, it prevents us from expediently dealing with cluster availability and performance issues.
I have tried the following tweaks to the shutdown process in hopes one of them might prevent the RTS, but none helped:
- make replica hidden before shutting it down (we always do this anyway)
- remove replica from replica set before shutting it down
- run db.fsyncLock() to disable writes prior to shutdown
- use db.shutdownServer({ force: true }) to shutdown rather than SIGTERM
I also tried stopping the server with "kill -9". This is the one thing that actually worked. It resulted in the subsequent mongod restart taking longer - 5 minutes, rather than the typical 1 minute, but no RTS was performed. Nevertheless, using SIGKILL seems like a bad idea, since it gives Mongo no opportunity to gracefully close files, network connections, etc.
We see this issue about 4 out of 5 times in our test cluster, which consists of:
- 2 shards
- 4 replicas per shard, 3 active w/ local SSD's and 1 hidden EBS for backup purposes
- 150 databases with about 100 collections each
- 250 GB total data
There is a load generator running that ensures there are regular updates being made to one of the databases, but the load is light.
Please let me know if there's any other information I can provide to help you diagnose this issue.
Thank you!
- related to
-
SERVER-52815 Investigate calling 'rollback_to_stable' prior to shutting down WiredTiger
- Closed