Loading...

XML

Word

Printable

JSON

Type: Improvement
Resolution: Done
Priority: Major - P3
Fix Version/s: None
Affects Version/s: None
Component/s: Logging
Labels:
None

Assigned Teams:

Storage Engines
Case:
Confidence Status:
None
Work Order:
3

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name:
None
Goal Link:
None

It has been observed in the most recent versions of MongoDB (4.4.5) that startup recovery of an instance with a large data set can take several hours.

This is problematic for TSEs and for the users as the status of the mongod process is not entirely clear.

For example:

There is no indication how many phases or steps the recovery process has in total
Where the current progress is
How many steps are outstanding

To illustrate, it's been observed that a mongod spends multiple hours recovering after having a message similar to the following printed in the log with no other clear signs of progress:

{"t":{"$date":"2021-04-21T00:51:14.703+00:00"},"s":"I",  "c":"STORAGE",  "id":22430,   "ctx":"initandlisten","msg":"WiredTiger message","attr":{"message":"[1618966274:703153][5978:0x7f67cef96bc0], file:collection-2-1893086824266225355.wt, txn-recover: [WT_VERB_RECOVERY_PROGRESS] Recovering log 62628 through 62628"}}
{"t":{"$date":"2021-04-21T00:51:14.759+00:00"},"s":"I",  "c":"STORAGE",  "id":22430,   "ctx":"initandlisten","msg":"WiredTiger message","attr":{"message":"[1618966274:759611][5978:0x7f67cef96bc0], file:collection-2-1893086824266225355.wt, txn-recover: [WT_VERB_RECOVERY | WT_VERB_RECOVERY_PROGRESS] Set global recovery timestamp: (1618965463, 1)"}}
{"t":{"$date":"2021-04-21T00:51:14.759+00:00"},"s":"I",  "c":"STORAGE",  "id":22430,   "ctx":"initandlisten","msg":"WiredTiger message","attr":{"message":"[1618966274:759674][5978:0x7f67cef96bc0], file:collection-2-1893086824266225355.wt, txn-recover: [WT_VERB_RECOVERY | WT_VERB_RECOVERY_PROGRESS] Set global oldest timestamp: (1618965458, 1)"}}

depends on

WT-7452 Improve logging when recovery (and RTS) is taking a long time

Closed

related to

WT-7442 RTS to open dhandle only when the dhandle has unstable updates

Closed

Assignee:: [DO NOT USE] Backlog - Storage Engines Team
Reporter:: Dmitry Ryabtsev
Participants:: [DO NOT USE] Backlog - Storage Engines Team, Daniel Gottlieb, Dmitry Ryabtsev, Luke Pearson, Sulabh Mahajan
Votes:: 9 Vote for this issue
Watchers:: 15 Start watching this issue

Created:: Apr 21 2021 03:41:43 AM UTC
Updated:: Dec 06 2022 01:23:34 AM UTC
Resolved:: May 21 2021 12:12:28 AM UTC
Confidence Status Last Update:: 21/May/21 12:11 AM

Details

Description

Attachments

Issue Links

Activity

People

Dates