Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Fixed
Priority: Major - P3
Fix Version/s: 7.1.0-rc0, 7.0.0-rc4
Affects Version/s: 7.0.0-rc2
Component/s: None
Labels:
None

Assigned Teams:

Sharding EMEA
Backwards Compatibility:
Fully Compatible
Operating System:
ALL
Backport Requested:

v7.0
Steps To Reproduce:

Hide

Stepdown on the coordinator shard during movePrimary coordinator after completion of kCommit phase and beginning of kExitCriticalSection.

Show
Stepdown on the coordinator shard during movePrimary coordinator after completion of kCommit phase and beginning of kExitCriticalSection.
Sprint:
Sharding EMEA 2023-06-12, Sharding EMEA 2023-06-26
Linked BF Score:
113
Confidence Status:
None
Work Order:
3

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name:
None
Goal Link:
None

If a primary failover happens during movePrimary operation, we could miss to clear database metadata on the original primary node of the coordiantor shard, leading to possible data loss.

As part of movePrimary coordinator, database metadata on primary node is explicitly cleared in kCommit phase, while on secondary nodes metadata is cleared indirectly when we exit the database recoverable critical section in kExitCriticalSection phase.

If a step-down happens between these two phases and a new primary node is elected on the coordinator shard we could miss clearing metadata on the new primary.

Consider the following scenario:

kCommit
- N1 (primary) -> db metadata cleared
- N2 (secondary) -> db metadata not cleared
kExitCriticalSection
- N1 (secondary) -> db metadata cleared
- N2 (primary) -> db metadata not cleared

is caused by

SERVER-71308 Enable featureFlag for resilient movePrimary

Closed

Assignee:: Enrico Golfieri
Reporter:: Tommaso Tocci
Participants:: Enrico Golfieri, Githook User, Tommaso Tocci
Votes:: 0 Vote for this issue
Watchers:: 6 Start watching this issue

Created:: Jun 02 2023 05:46:18 PM UTC
Updated:: Oct 29 2023 09:20:31 PM UTC
Resolved:: Jun 15 2023 02:26:15 PM UTC
Confidence Status Last Update:: 07/Jun/23 8:32 AM

Details

Description

Attachments

Issue Links

Activity

People

Dates