-
Type: Bug
-
Resolution: Fixed
-
Priority: Major - P3
-
Affects Version/s: None
-
Component/s: Replication
-
None
-
Fully Compatible
-
ALL
-
-
Repl 2018-04-09
-
4
Consider a node initial syncing from a primary. The initial syncing node, before the oplog application phase has the following collection -> UUID mappings:
test.foo -> Does not exist test.bar -> UUID("7b447a86-5a12-42bc-916f-edcbfcb71cc0") test.shardedColl -> UUID("16ce21b8-320c-4bfc-af8c-05cae93bdadb")
It also happens to be that this is the correct goal state.
Now consider the sequence of oplog entries that are played as part of the oplog application step of initial sync:
{ "ts" : Timestamp(1520987617, 29), "op" : "c", "ns" : "test.$cmd", "ui" : UUID("78f0578a-1ecf-4ba3-af46-bfc13ebb46fa"), "wall" : ISODate("2018-03-14T00:33:37.571Z"), "o" : { "renameCollection" : "test.bar", "to" : "test.shardedColl", "stayTemp" : false, "dropTarget" : true } } { "ts" : Timestamp(1520987618, 3), "op" : "c", "ns" : "test.$cmd", "ui" : UUID("16ce21b8-320c-4bfc-af8c-05cae93bdadb"), "wall" : ISODate("2018-03-14T00:33:38.073Z"), "o" : { "renameCollection" : "test.foo", "to" : "test.bar", "stayTemp" : false, "dropTarget" : true } } { "ts" : Timestamp(1520987621, 2), "op" : "c", "ns" : "test.$cmd", "ui" : UUID("7b447a86-5a12-42bc-916f-edcbfcb71cc0"), "wall" : ISODate("2018-03-14T00:33:41.532Z"), "o" : { "create" : "foo", "idIndex" : { "v" : 2, "key" : { "_id" : 1 }, "name" : "_id_", "ns" : "test.foo" } } } { "ts" : Timestamp(1520987622, 2), "op" : "c", "ns" : "test.$cmd", "ui" : UUID("16ce21b8-320c-4bfc-af8c-05cae93bdadb"), "wall" : ISODate("2018-03-14T00:33:42.103Z"), "o" : { "renameCollection" : "test.bar", "to" : "test.shardedColl", "stayTemp" : false, "dropTarget" : UUID("78f0578a-1ecf-4ba3-af46-bfc13ebb46fa") } } { "ts" : Timestamp(1520987622, 9), "op" : "c", "ns" : "test.$cmd", "ui" : UUID("7b447a86-5a12-42bc-916f-edcbfcb71cc0"), "wall" : ISODate("2018-03-14T00:33:42.437Z"), "o" : { "renameCollection" : "test.foo", "to" : "test.bar", "stayTemp" : false, "dropTarget" : true } }
Those operations result in the initial syncing node to only have test.shardedColl -> UUID("16ce21b8-320c-4bfc-af8c-05cae93bdadb") while missing the expected test.bar collection.
I can make statements about behavior change that fix this specific sequence, but I can't speak to their correctness globally.
I do have two observations that might illuminate what's going wrong.
- judah.schvimer notes that when collections have UUIDs, a replicated renameCollection's dropTarget should never be true. It should be changed to false if the target did not exist, or it should be the UUID of the collection that was dropped.
- Perhaps more subtly, in this sequence (which is true for using applyOps, as well as what was observed on the initial syncing node), the creation of test.foo fails because the UUID already exists from the collection that was dropped as the target of the previous rename.
2018-03-16T19:13:59.519-0400 I COMMAND [conn1] CMD: create test.foo - existing collection with conflicting UUID 7b447a86-5a12-42bc-916f-edcbfcb71cc0 is in a drop-pending state: test.system.drop.1521242039i3t1.bar
One clarification about the reproduction script. The first two "oplog entries" are simply to recreate the state of the initial syncing node before it began oplog application. Had a primary produced that sequence of oplog entries, including the initial creates would be a very different problem.
- is depended on by
-
SERVER-33946 Decrease number of initial sync attempts in tests to 1
- Blocked
- is duplicated by
-
SERVER-34727 Rename collection during initial sync drops extra collection
- Closed
- is related to
-
SERVER-33087 Fix the use of dropTarget in renameCollection
- Closed
- related to
-
SERVER-34196 OpObserver::onRenameCollection() dropTarget boolean argument is redundant
- Closed