-
Type: Bug
-
Resolution: Fixed
-
Priority: Major - P3
-
Affects Version/s: None
-
Component/s: None
-
None
-
Storage Execution
-
Fully Compatible
-
ALL
-
v7.0
-
-
Execution NAMR Team 2023-08-21
-
120
When stepping down, we want caller threads to be able to know that the journal flusher was interrupted. Otherwise, we can get into a deadlock.
Areas that may need to be addressed:
- All interruptions are retried, even ErrorCodes::InterruptedDueToReplStateChange. We want to be able to setError when the journal flusher is interrupted during stepdown.
- Even if we setError, we still get stuck in this infinite while loop retrying to flush the journal.
- The caller we're concerned about is writeConcern for the stepdown deadlock is writeConcern. We may want to waitForJournalFlusher without retrying and could introduce a new method for this. Or we may want to pass the writeConcern's opCtx to the journalFlusher so it can be interrupted.
We should add a test for this deadlock so we can confirm fixing it and catching it early if it happens again.
- is related to
-
SERVER-48149 Move callers of waitUntilDurable onto JournalFlusher::waitForJournalFlush
- Closed
-
SERVER-55745 The Fuzzer can run killOp on the JournalFlusher thread and cause it to throw an unexpected error
- Closed
-
SERVER-57229 killOp_against_journal_flusher_thread.js must ensure the JournalFlusher doesn't reset the opCtx between finding the opId and running killOp
- Closed
-
SERVER-79026 Failing to cancel the JournalFlusher thread might lead to 3-way deadlock
- Closed
-
SERVER-61484 Allow ExceededMemoryLimit to be a benign log warning instead of an invariant in the JournalFlusher
- Closed
-
SERVER-79174 Improve journal flusher interruption handling
- Closed
- related to
-
SERVER-79919 write js test for SERVER-79810
- Closed