There have been issues where threads running concurrently with stepdown call waitUntilDurable independently.
Stepdown changes the behavior of waitUntilDurable to stop doing writes to the oplogTruncateAfterPoint document and then clears the oplogTruncateAfterPoint timestamp. It does this with careful interruption of the JournalFlusher thread that does async waitUntilDurable calls. However, operations running concurrently with stepdown sometimes require durability and call waitUntilDurable directly: these operations are not carefully interrupted by stepdown prior to stepdown clearing the oplogTruncateAfterPoint timestamp. Consequently, the oplogTruncateAfterPoint can remain set after stepdown, which it should not be.
--------------------
waitUntilUnjournaledWritesDurable and flushAllFiles are callers of waitUntilDurable, but cannot be moved onto the JournalFlusher thread because they provide parameter settings that the JournalFlusher does not. The interface to using these two functions should be made very explicit about the risk of running concurrently with stepdown. Today, I do not believe there are any callers that can run concurrently with stepdown.
--------------------
This will actually be a bit tricky because we will probably have to make sure that new JournalFlusher::waitForJournalFlush callers can retry if interrupted by stepdown (or whatever interrupts the JournalFlusher thread).
- depends on
-
SERVER-46826 Instantiate the JournalFlusher thread for ephemeral engines and when non-durable (nojournal=true)
- Closed
- is depended on by
-
SERVER-47898 Advancing lastDurable irrespective of lastApplied
- Closed
- related to
-
SERVER-79810 make JournalFlusher::waitForJournalFlush() interruptible when waiting for write concern
- Closed
-
SERVER-79809 remove unused functions from StorageControl namespace
- Closed