Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Fixed
Priority: Major - P3
Fix Version/s: 8.1.0-rc0
Affects Version/s: 9.0 Required, 8.0.0-rc17
Component/s: Sharding
Labels:
None

Assigned Teams:

Cluster Scalability
Backwards Compatibility:
Fully Compatible
Operating System:
ALL
Backport Requested:

v8.0
Steps To Reproduce:

Hide

1. Run the reproducible attached using the sharding suite with base version c3475ffa8

Show
1. Run the reproducible attached using the sharding suite with base version c3475ffa8
Sprint:
Cluster Scalability 2024-07-22, Cluster Scalability 2024-08-19
Linked BF Score:
0
Confidence Status:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name:
None
Goal Link:
None

The cleanupOrphaned command currently checks the metadata state before actually waiting for range deletions in the shard.

Separately, in config transition suites the config server might be the destination of migrations. If the config server stops being a data bearing shard, then any ongoing migration might fail in any of its steps, like for example, while trying to commit the migration. If this happens, we momentarily clear the filtering metadata, so, the following interleaving could make a cleanupOrphaned command to fail in a shard that is the source of a migration in a config transition suite:

A migration starts with destination shard the config server
Before entering the critical section, there is a call to the transitionToDedicatedConfigServer command
A cleanupOrphaned command reaches the destination shard of the migration
Before checking the metadata, the migration tries to commit, but, it fails because the config server is draining, so it clears the shard metadata
The cleanupOrphaned command fails because it does not find metadata in the node

This could cause tests to fail on config transition suites (like for example, the cleanupOrphanedWhileMigrating.js FSM workload). We could adjust tests to handle this type of errors, or make cleanupOrphaned more robust by retrying the refresh/waitForClean in a loop. You can find a reproducible attached.

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

BF-34382.repro
4 kB
Aug 05 2024 05:36:48 PM UTC

related to

SERVER-93222 Deprecate cleanupOrphaned

Open

Assignee:: Janna Golden

Reporter:: Marcos José Grillo Ramirez

Participants:: Githook User, Janna Golden, Marcos José Grillo Ramirez

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Created:: Aug 05 2024 05:45:51 PM UTC

Updated:: Aug 08 2024 02:52:38 PM UTC

Resolved:: Aug 08 2024 02:52:37 PM UTC

Confidence Status Last Update:: 07/Aug/24 3:09 PM

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates