-
Type: Bug
-
Resolution: Fixed
-
Priority: Major - P3
-
Affects Version/s: 5.3.0, 5.0.0, 6.0.0-rc3
-
Component/s: Sharding
-
Fully Compatible
-
ALL
-
v6.0, v5.0
-
Sharding NYC 2022-05-30, Sharding NYC 2022-06-13
-
3
While the recipient shards are in RecipientStateEnum::kApplying, they will continuously fetch oplog entries from writes on the donor shards and apply them. If there's a operation-fatal error while applying an oplog entries, the recipient shard will transition to RecipientStateEnum::kError and inform the coordinator shard.
[j0:s0:prim] | 2022-04-27T09:17:49.060+00:00 I RESHARD 4956500 [ReshardingRecipientService-1] "Resharding operation recipient state machine failed","attr":{"namespace":"test0_fsmdb0.fsmcoll0","reshardingUUID":{"uuid":{"$uuid":"08d271ae-91a9-4f52-9b2f-7de7eb4a0a33"}},"error":"OplogOperationUnsupported: Command not supported during resharding: { oplogEntry: { op: \"c\", ns: \"test0_fsmdb0.fsmcoll0\", ui: UUID(\"07b07822-7c51-410d-85e2-c5d5d4060998\"), o: { dbCheck: \"test0_fsmdb0.fsmcoll0\", type: \"batch\", md5: \"d381a905564387e42a68127855fecdf6\", minKey: MinKey, maxKey: MaxKey, readTimestamp: Timestamp(1651051063, 135), applyOps: null }, ts: Timestamp(1651051063, 156), t: 1, v: 2, wall: new Date(1651051063854), _id: { clusterTime: Timestamp(1651051063, 156), ts: Timestamp(1651051063, 156) } } }"} [j0:s0:prim] | 2022-04-27T09:17:49.061+00:00 I RESHARD 5279506 [ReshardingRecipientService-1] "Transitioned resharding recipient state","attr":{"newState":"error","oldState":"applying","namespace":"test0_fsmdb0.fsmcoll0","collectionUUID":{"uuid":{"$uuid":"07b07822-7c51-410d-85e2-c5d5d4060998"}},"reshardingUUID":{"uuid":{"$uuid":"08d271ae-91a9-4f52-9b2f-7de7eb4a0a33"}}}
While the recipient shards are in RecipientStateEnum::kApplying, the coordinator shard is monitoring for an opportune moment to commit the resharding operation based on how caught up the recipient shards are to the writes on the donor shards. The coordinator shard won't realize that the recipient shards will never reach an opportune time to commit because the resharding operation must abort.
[j0:c:prim] | 2022-04-27T09:17:49.089+00:00 I RESHARD 5391602 [ReshardingCoordinatorService-2] "Resharding operation waiting for an okay to enter critical section" [j0:c:prim] | 2022-04-27T09:17:49.089+00:00 I RESHARD 5392001 [ReshardingCoordinatorService-2] "Querying recipient shards for the remaining operation time","attr":{"namespace":"test0_fsmdb0.fsmcoll0"} [j0:c:prim] | 2022-04-27T09:17:49.090+00:00 I RESHARD 5392002 [ReshardingCoordinatorService-2] "Finished querying recipient shards for the remaining operation time","attr":{"namespace":"test0_fsmdb0.fsmcoll0","remainingTimeMillis":5163} ... [j0:c:prim] | 2022-04-27T09:18:01.750+00:00 I RESHARD 5392001 [ReshardingCoordinatorService-2] "Querying recipient shards for the remaining operation time","attr":{"namespace":"test0_fsmdb0.fsmcoll0"} [j0:c:prim] | 2022-04-27T09:18:01.751+00:00 I RESHARD 5392002 [ReshardingCoordinatorService-2] "Finished querying recipient shards for the remaining operation time","attr":{"namespace":"test0_fsmdb0.fsmcoll0","remainingTimeMillis":5163}
An operator can manually issue the abortReshardCollection command for the operation to cancel the resharding operation.
- is related to
-
SERVER-63855 Make dbCheck work with resharding
- Backlog
-
SERVER-66011 Enable internal_transactions_resharding.js in the concurrency_sharded_multi_stmt_txn_with_balancer suite
- Closed