Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Works as Designed
Priority: Major - P3
Fix Version/s: None
Affects Version/s: 4.4.9, 7.0.11
Component/s: None
Labels:
None

Assigned Teams:

Replication
Operating System:
ALL
Steps To Reproduce:

Hide

Attached is the patch file with the JS test that reproduces this and associated code changes. We have been able to run this test with 100% consistency.

Show
Attached is the patch file with the JS test that reproduces this and associated code changes. We have been able to run this test with 100% consistency.
Sprint:
Repl 2024-10-14, Repl 2024-10-28

We have observed two cases of failover on our mongo setup running v4.4.9 where majority secondaries enter rollback state. Chaining is disabled on our setup. We then attempted to reproduce this scenario on v7.0 using JS tests and believe the bug still exists.

Below is a rough sequence of events that can lead to rollback and the associated JS test is attached as a patch file . Note that we have sleeps added in the source code to help better simulate what we saw on our setup.

Old primary is frozen - threads are not making progress.
Meanwhile, write requests are issued to the old primary and these get stuck too.
Election triggers by way of not seeing a progressing primary and a new primary wins the election.
During the catch up phase on the new primary, writes from (2) unfreeze on the old primary and make their way to Oplog
All secondaries sync these writes to their Oplog
New primary exits catch up phase and declares ready to accept writes
Secondaries switch sync source to new primary and realize that Oplog has diverged, enter rollback state for several minutes
During (7), the cluster is unavailable for reads and writes rendering the cluster down

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

0001-Add-JS-test-to-simulate-a-case-where-majority-second.patch
6 kB
Oct 07 2024 07:03:07 AM UTC
0001-Drop-last-batch-of-oplog-entries-if-primary-has-chan.patch
2 kB
Oct 14 2024 11:06:27 AM UTC
Screenshot 2024-10-07 at 1.07.49 PM.png
162 kB
Oct 07 2024 05:08:03 PM UTC

is related to

SERVER-91764 Election of new primary caused all secondaries to rollback

Closed

related to

SERVER-91764 Election of new primary caused all secondaries to rollback

Closed

Assignee:: Wenbin Zhu

Reporter:: Preeti Murthy

Participants:: Preeti Murthy, Suraj Narkhede, Tim T, Wenbin Zhu

Votes:: 0 Vote for this issue

Watchers:: 15 Start watching this issue

Created:: Oct 07 2024 06:55:31 AM UTC

Updated:: Oct 24 2024 05:12:01 PM UTC

Resolved:: Oct 24 2024 05:12:00 PM UTC

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates