Loading...

XML

Word

Printable

JSON

Type: Task
Resolution: Fixed
Priority: Major - P3
Fix Version/s: 4.3.4
Affects Version/s: None
Component/s: Sharding
Labels:
None

Backwards Compatibility:
Fully Compatible
Sprint:
Sharding 2020-02-24
Confidence Status:
None
Work Order:
3

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name:
None
Goal Link:
None

First, at least the refreshFilteringMetadataUntilSuccess loop is racy when used to test a stepdown while hanging in the failpoint in the loop, because the failpoint causes the loop to enter an interruptible sleep. The sleep is interruptible because an OperationContext is passed. Since the OperationContext was used to take strong locks as part of forceShardFilteringMetadataRefresh (all the AutoGetDb/AutoGetCollection in here), the OperationContext gets interrupted by the stepdown, and immediately enters the catch block (even before the failpoint is turned off). The race is that the stepdown may not have updated the memberState and term yet, so this assertion passes and loop starts a second iteration, rather than failing on the first iteration.

If the above race happens and the loop starts a second iteration, then the migration_coordinator_failover.js test "accidentally" passes if the same node is elected primary because of a bug in the ShardServerCatalogCacheLoader (~~SERVER-45646~~). This bug causes the forceShardFilteringMetadataRefresh in the second iteration of the loop to throw NetworkInterfaceExceededTimeLimit, and therefore the catch block is entered again and checks the assertion again, this time after the member state has been updated.

We can fix this by avoiding the first race by making the failpoint use an uninterruptible sleep when being used to pause the thread in order to induce a stepdown.

is depended on by

SERVER-44771 Allow operations in transactions to safely consult the CatalogCache on mongod

Closed

Assignee:: Esha Maharishi (Inactive)
Reporter:: Esha Maharishi (Inactive)
Participants:: Esha Maharishi, Githook User
Votes:: 0 Vote for this issue
Watchers:: 2 Start watching this issue

Created:: Feb 13 2020 04:10:48 PM UTC
Updated:: Oct 29 2023 10:12:15 PM UTC
Resolved:: Feb 14 2020 12:17:58 AM UTC

Details

Description

Attachments

Issue Links

Activity

People

Dates