Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Fixed
Priority: Major - P3
Fix Version/s: 5.0.10, 6.0.0-rc10, 6.1.0-rc0
Affects Version/s: 5.3.0, 5.0.0, 6.0.0-rc3
Component/s: Sharding
Labels:
- sharding-nyc-subteam1

Backwards Compatibility:
Fully Compatible
Operating System:
ALL
Backport Requested:

v6.0, v5.0
Sprint:
Sharding NYC 2022-05-30, Sharding NYC 2022-06-13
Story Points:
3
Confidence Status:
None
Work Order:
3

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name:
None
Goal Link:
None

While the recipient shards are in RecipientStateEnum::kApplying, they will continuously fetch oplog entries from writes on the donor shards and apply them. If there's a operation-fatal error while applying an oplog entries, the recipient shard will transition to RecipientStateEnum::kError and inform the coordinator shard.

[j0:s0:prim] | 2022-04-27T09:17:49.060+00:00 I  RESHARD  4956500 [ReshardingRecipientService-1] "Resharding operation recipient state machine failed","attr":{"namespace":"test0_fsmdb0.fsmcoll0","reshardingUUID":{"uuid":{"$uuid":"08d271ae-91a9-4f52-9b2f-7de7eb4a0a33"}},"error":"OplogOperationUnsupported: Command not supported during resharding: { oplogEntry: { op: \"c\", ns: \"test0_fsmdb0.fsmcoll0\", ui: UUID(\"07b07822-7c51-410d-85e2-c5d5d4060998\"), o: { dbCheck: \"test0_fsmdb0.fsmcoll0\", type: \"batch\", md5: \"d381a905564387e42a68127855fecdf6\", minKey: MinKey, maxKey: MaxKey, readTimestamp: Timestamp(1651051063, 135), applyOps: null }, ts: Timestamp(1651051063, 156), t: 1, v: 2, wall: new Date(1651051063854), _id: { clusterTime: Timestamp(1651051063, 156), ts: Timestamp(1651051063, 156) } } }"}
[j0:s0:prim] | 2022-04-27T09:17:49.061+00:00 I  RESHARD  5279506 [ReshardingRecipientService-1] "Transitioned resharding recipient state","attr":{"newState":"error","oldState":"applying","namespace":"test0_fsmdb0.fsmcoll0","collectionUUID":{"uuid":{"$uuid":"07b07822-7c51-410d-85e2-c5d5d4060998"}},"reshardingUUID":{"uuid":{"$uuid":"08d271ae-91a9-4f52-9b2f-7de7eb4a0a33"}}}

While the recipient shards are in RecipientStateEnum::kApplying, the coordinator shard is monitoring for an opportune moment to commit the resharding operation based on how caught up the recipient shards are to the writes on the donor shards. The coordinator shard won't realize that the recipient shards will never reach an opportune time to commit because the resharding operation must abort.

[j0:c:prim] | 2022-04-27T09:17:49.089+00:00 I  RESHARD  5391602 [ReshardingCoordinatorService-2] "Resharding operation waiting for an okay to enter critical section"
[j0:c:prim] | 2022-04-27T09:17:49.089+00:00 I  RESHARD  5392001 [ReshardingCoordinatorService-2] "Querying recipient shards for the remaining operation time","attr":{"namespace":"test0_fsmdb0.fsmcoll0"}
[j0:c:prim] | 2022-04-27T09:17:49.090+00:00 I  RESHARD  5392002 [ReshardingCoordinatorService-2] "Finished querying recipient shards for the remaining operation time","attr":{"namespace":"test0_fsmdb0.fsmcoll0","remainingTimeMillis":5163}
...
[j0:c:prim] | 2022-04-27T09:18:01.750+00:00 I  RESHARD  5392001 [ReshardingCoordinatorService-2] "Querying recipient shards for the remaining operation time","attr":{"namespace":"test0_fsmdb0.fsmcoll0"}
[j0:c:prim] | 2022-04-27T09:18:01.751+00:00 I  RESHARD  5392002 [ReshardingCoordinatorService-2] "Finished querying recipient shards for the remaining operation time","attr":{"namespace":"test0_fsmdb0.fsmcoll0","remainingTimeMillis":5163}

An operator can manually issue the abortReshardCollection command for the operation to cancel the resharding operation.

is related to

SERVER-63855 Make dbCheck work with resharding

Backlog

SERVER-66011 Enable internal_transactions_resharding.js in the concurrency_sharded_multi_stmt_txn_with_balancer suite

Closed

Assignee:: Nandini Bhartiya
Reporter:: Max Hirschhorn
Participants:: Githook User, Max Hirschhorn, Nandini Bhartiya
Votes:: 0 Vote for this issue
Watchers:: 3 Start watching this issue

Created:: Apr 28 2022 01:46:46 PM UTC
Updated:: Oct 29 2023 09:38:52 PM UTC
Resolved:: Jun 08 2022 04:00:41 PM UTC
Confidence Status Last Update:: 24/May/22 4:28 PM

Details

Description

Attachments

Issue Links

Activity

People

Dates