-
Type: Bug
-
Resolution: Fixed
-
Priority: Major - P3
-
Affects Version/s: None
-
Component/s: Sharding
-
Fully Compatible
-
ALL
The collection version is seen by both shards d20020 and d20022 as being 2|1||5fec046614ff529dbac7fa05. However, only shard d20022 correctly sees the coordinator state as "cloning" while shard d20020 incorrectly sees the coordinator state as "preparing-to-donate". This causes the d20020 shard to skip constructing a RecipientStateMachine but leads the coordinator (config server) to believe the d20020 shard has finished refreshing. The resharding operation is then left unable to make further progress.
This issue appears to only happen (and only very rarely happen) when the temporary resharding collection is being queried via mongos by the test client. I wonder if there's another issue along the lines of SERVER-51510 in ShardServerCatalogCacheLoader::_getLoaderMetadata() still.
[js_test:resharding_replicate_updates_as_insert_delete] 2020-12-30T04:39:02.423+0000 d20020| 2020-12-30T04:39:02.423+00:00 I SH_REFR 4619901 [CatalogCache-0] "Refreshed cached collection","attr":{"namespace":"test.system.resharding.0fe4b9ee-41d2-4855-8411-32539bc84657","newVersion":{"chunkVersion":{"0":{"$timestamp":{"t":2,"i":1}},"1":{"$oid":"5fec046614ff529dbac7fa05"}},"forcedRefreshSequenceNum":1,"epochDisambiguatingSequenceNum":7},"oldVersion":{"chunkVersion":"None","forcedRefreshSequenceNum":0,"epochDisambiguatingSequenceNum":0},"durationMillis":2} [js_test:resharding_replicate_updates_as_insert_delete] 2020-12-30T04:39:02.423+0000 d20020| | 2020-12-30T04:39:02.423+00:00 I SHARDING 5262000 [RecoverRefreshThread] "Ignoring shard version change","attr":{"reshardingFields":{"uuid":{"$uuid":"6e22c09a-3051-43d4-861e-06f9629abb7a"},"state":"preparing-to-donate","recipientFields":{"donorShardIds":["shard0","shard1"],"existingUUID":{"$uuid":"0fe4b9ee-41d2-4855-8411-32539bc84657"},"originalNamespace":"test.foo"}},"collectionMetadata":"collection version: 2|1||5fec046614ff529dbac7fa05, shard version: 2|0||5fec046614ff529dbac7fa05"} ... [js_test:resharding_replicate_updates_as_insert_delete] 2020-12-30T04:39:02.429+0000 d20022| 2020-12-30T04:39:02.426+00:00 I SH_REFR 4619901 [CatalogCache-0] "Refreshed cached collection","attr":{"namespace":"test.system.resharding.0fe4b9ee-41d2-4855-8411-32539bc84657","newVersion":{"chunkVersion":{"0":{"$timestamp":{"t":2,"i":1}},"1":{"$oid":"5fec046614ff529dbac7fa05"}},"forcedRefreshSequenceNum":1,"epochDisambiguatingSequenceNum":6},"oldVersion":{"chunkVersion":"None","forcedRefreshSequenceNum":0,"epochDisambiguatingSequenceNum":0},"durationMillis":3} [js_test:resharding_replicate_updates_as_insert_delete] 2020-12-30T04:39:02.429+0000 d20022| | 2020-12-30T04:39:02.426+00:00 I SHARDING 5262001 [RecoverRefreshThread] "Creating recipient state machine","attr":{"reshardingFields":{"uuid":{"$uuid":"6e22c09a-3051-43d4-861e-06f9629abb7a"},"state":"cloning","recipientFields":{"fetchTimestamp":{"$timestamp":{"t":1609303142,"i":51}},"donorShardIds":["shard0","shard1"],"existingUUID":{"$uuid":"0fe4b9ee-41d2-4855-8411-32539bc84657"},"originalNamespace":"test.foo"}},"collectionMetadata":"collection version: 2|1||5fec046614ff529dbac7fa05, shard version: 2|1||5fec046614ff529dbac7fa05"} ... [js_test:resharding_replicate_updates_as_insert_delete] 2020-12-30T04:39:02.431+0000 d20020| 2020-12-30T04:39:02.431+00:00 I SH_REFR 4619901 [CatalogCache-0] "Refreshed cached collection","attr":{"namespace":"test.system.resharding.0fe4b9ee-41d2-4855-8411-32539bc84657","newVersion":{"chunkVersion":{"0":{"$timestamp":{"t":2,"i":1}},"1":{"$oid":"5fec046614ff529dbac7fa05"}},"forcedRefreshSequenceNum":1,"epochDisambiguatingSequenceNum":8},"oldVersion":{"chunkVersion":"None","forcedRefreshSequenceNum":0,"epochDisambiguatingSequenceNum":0},"durationMillis":3} [js_test:resharding_replicate_updates_as_insert_delete] 2020-12-30T04:39:02.431+0000 d20020| | 2020-12-30T04:39:02.431+00:00 I SHARDING 5262000 [RecoverRefreshThread] "Ignoring shard version change","attr":{"reshardingFields":{"uuid":{"$uuid":"6e22c09a-3051-43d4-861e-06f9629abb7a"},"state":"preparing-to-donate","recipientFields":{"donorShardIds":["shard0","shard1"],"existingUUID":{"$uuid":"0fe4b9ee-41d2-4855-8411-32539bc84657"},"originalNamespace":"test.foo"}},"collectionMetadata":"collection version: 2|1||5fec046614ff529dbac7fa05, shard version: 2|0||5fec046614ff529dbac7fa05"} ... [js_test:resharding_replicate_updates_as_insert_delete] 2020-12-30T04:39:02.457+0000 d20022| 2020-12-30T04:39:02.457+00:00 D1 MIGRATE 5002300 [ReshardingRecipientService-0] "Creating temporary resharding collection","attr":{"originalNss":"test.foo"} [js_test:resharding_replicate_updates_as_insert_delete] 2020-12-30T04:39:02.460+0000 d20022| 2020-12-30T04:39:02.460+00:00 I SH_REFR 4619901 [CatalogCache-0] "Refreshed cached collection","attr":{"namespace":"test.foo","newVersion":{"chunkVersion":{"0":{"$timestamp":{"t":2,"i":1}},"1":{"$oid":"5fec046697d08cdb539562b8"}},"forcedRefreshSequenceNum":1,"epochDisambiguatingSequenceNum":7},"oldVersion":{"chunkVersion":"None","forcedRefreshSequenceNum":0,"epochDisambiguatingSequenceNum":0},"durationMillis":2} [js_test:resharding_replicate_updates_as_insert_delete] 2020-12-30T04:39:02.461+0000 d20022| 2020-12-30T04:39:02.461+00:00 I STORAGE 20320 [ReshardingRecipientService-0] "createCollection","attr":{"namespace":"test.system.resharding.0fe4b9ee-41d2-4855-8411-32539bc84657","uuidDisposition":"provided","uuid":{"uuid":{"$uuid":"6e22c09a-3051-43d4-861e-06f9629abb7a"}},"options":{"uuid":{"$uuid":"6e22c09a-3051-43d4-861e-06f9629abb7a"}}}
(These logs are from a patch build where the "Ignoring shard version change" and "Creating recipient state machine" messages have been added.)
- depends on
-
SERVER-54874 Ensure reading of consistent config.collections and config.chunks when refreshing the CatalogCache
- Closed
-
SERVER-55146 Bump collection version on any modification of config.collections reshardingFields or allowMigrations fields
- Closed
- is related to
-
SERVER-52620 Update resharding_replicate_updates_as_insert_delete.js to use reshardCollection command
- Closed
- related to
-
SERVER-55307 Complete TODO listed in SERVER-53539
- Closed