There are multiple places where the ReshardingCoordinatorService, ReshardingRecipientService, and ReshardingDonorService attempt to target the primary of a replica set shard:
- ReshardingCoordinatorService when sending the _flushRoutingTableCacheUpdatesWithWriteConcern command to shard participants.
- ReshardingCoordinatorService when sending the _shardsvrCommitReshardCollection command to shard participants.
- ReshardingCoordinatorService when sending the _shardsvrAbortReshardCollection command to shard participants.
- ReshardingRecipientService when updating its entry in the coordinator's config.reshardingOperations state document.
- ReshardingDonorService when updating its entry in the coordinator's config.reshardingOperations state document.
Internally, these function calls go through RemoteCommandTargeterRS::findHost() and will throw a FailedToSatisfyReadPreference after kDefaultFindHostTimeout 15 seconds if a primary is unavailable on the remote shard. This exception is caught and leads to an fassert() because, for example, it would be invalid for the participant shards to complete the resharding operation without performing a w:majority on the config server primary.
The resharding components should instead wait until a primary becomes available on the remote shard to avoid triggering this fassert().
[j0:s0:n1] {"t":{"$date":"2021-10-19T13:17:33.440+00:00"},"s":"I", "c":"RESHARD", "id":5279506, "ctx":"ReshardingRecipientService-4","msg":"Transitioned resharding recipient state","attr":{"newState":"applying","oldState":"cloning","namespace":"test1_fsmdb0.fsmcoll0","collectionUUID":{"uuid":{"$uuid":"04cb2914-75ec-4b6c-a4df-f416c22459c7"}},"reshardingUUID":{"uuid":{"$uuid":"245e08b2-5f20-4530-a87e-1acd0faa2db4"}}}} [j0:s0:n1] {"t":{"$date":"2021-10-19T13:17:33.465+00:00"},"s":"I", "c":"-", "id":4333227, "ctx":"ReshardingRecipientService-7","msg":"RSM monitoring host in expedited mode until we detect a primary","attr":{"host":"localhost:20002","replicaSet":"config-rs"}} [j0:s0:n1] {"t":{"$date":"2021-10-19T13:17:33.465+00:00"},"s":"I", "c":"-", "id":4333227, "ctx":"ReshardingRecipientService-7","msg":"RSM monitoring host in expedited mode until we detect a primary","attr":{"host":"localhost:20000","replicaSet":"config-rs"}} [j0:s0:n1] {"t":{"$date":"2021-10-19T13:17:33.465+00:00"},"s":"I", "c":"-", "id":4333218, "ctx":"ReshardingRecipientService-7","msg":"Rescheduling the next replica set monitoring request","attr":{"replicaSet":"config-rs","host":"localhost:20000","delayMillis":0}} [j0:s0:n1] {"t":{"$date":"2021-10-19T13:17:33.465+00:00"},"s":"I", "c":"-", "id":4333227, "ctx":"ReshardingRecipientService-7","msg":"RSM monitoring host in expedited mode until we detect a primary","attr":{"host":"localhost:20001","replicaSet":"config-rs"}} [j0:s0:n1] {"t":{"$date":"2021-10-19T13:17:48.465+00:00"},"s":"I", "c":"RESHARD", "id":4956500, "ctx":"ReshardingRecipientService-7","msg":"Resharding operation recipient state machine failed","attr":{"namespace":"test1_fsmdb0.fsmcoll0","reshardingUUID":{"uuid":{"$uuid":"245e08b2-5f20-4530-a87e-1acd0faa2db4"}},"error":"FailedToSatisfyReadPreference: Could not find host matching read preference { mode: \"primary\" } for set config-rs"}} [j0:s0:n1] {"t":{"$date":"2021-10-19T13:17:48.466+00:00"},"s":"I", "c":"RESHARD", "id":5279506, "ctx":"ReshardingRecipientService-7","msg":"Transitioned resharding recipient state","attr":{"newState":"error","oldState":"applying","namespace":"test1_fsmdb0.fsmcoll0","collectionUUID":{"uuid":{"$uuid":"04cb2914-75ec-4b6c-a4df-f416c22459c7"}},"reshardingUUID":{"uuid":{"$uuid":"245e08b2-5f20-4530-a87e-1acd0faa2db4"}}}} [j0:s0:n1] {"t":{"$date":"2021-10-19T13:18:03.470+00:00"},"s":"F", "c":"RESHARD", "id":5551101, "ctx":"ReshardingRecipientService-5","msg":"Unrecoverable error occurred past the point recipient was prepared to complete the resharding operation","attr":{"error":"FailedToSatisfyReadPreference: Could not find host matching read preference { mode: \"primary\" } for set config-rs"}} [j0:s0:n1] {"t":{"$date":"2021-10-19T13:18:03.470+00:00"},"s":"F", "c":"ASSERT", "id":23089, "ctx":"ReshardingRecipientService-5","msg":"Fatal assertion","attr":{"msgid":5551101,"file":"src/mongo/db/s/resharding/resharding_recipient_service.cpp","line":412}} [j0:s0:n1] {"t":{"$date":"2021-10-19T13:18:03.470+00:00"},"s":"F", "c":"ASSERT", "id":23090, "ctx":"ReshardingRecipientService-5","msg":"\n\n***aborting after fassert() failure\n\n"} [j0:s0:n1] {"t":{"$date":"2021-10-19T13:18:03.470+00:00"},"s":"F", "c":"CONTROL", "id":4757800, "ctx":"ReshardingRecipientService-5","msg":"Writing fatal message","attr":{"message":"Got signal: 6 (Aborted).\n"}}
- is depended on by
-
SERVER-57686 We need test coverage that runs resharding in the face of elections
- Closed
- related to
-
SERVER-60495 Retry FailedToSatisfyReadPreference in DDL coordinators
- Closed