ISSUE DESCRIPTION AND IMPACT
During initial sync, if a renameCollection operation is found during oplog application, the initial sync process is aborted and restarted to prevent data divergence in replica set nodes (see below for an example).
In addition to the renameCollection command, operations such as aggregations using $out and MapReduce with output to a collection may implicitly use renameCollection operations to create their output collections.
Users who attempt to resync a node and, before the process is complete, run any of the of the operations above, may see their initial sync process abort and restart. In extreme cases (e.g.: if users are constantly running aggregations to new collections) initial sync operations may never complete.
DIAGNOSIS AND AFFECTED VERSIONS
This change was made in SERVER-26117 and affects MongoDB 3.2.12 and newer. On initial sync, users may encounter the following error message:
2017-09-05T17:56:04.348+0000 E REPL [repl writer worker 5] Error applying operation: OplogOperationUnsupported: Applying renameCollection not supported in initial sync: { ts: Timestamp 1504588941000|592, h: 4948566672906734558, v: 2, op: "c", ns: "graphs.$cmd", o: { renameCollection: "graphs.tmp.agg_out.989", to: "graphs.graphs_temp", stayTemp: false, dropTarget: true } } ({ ts: Timestamp 1504588941000|592, h: 4948566672906734558, v: 2, op: "c", ns: "graphs.$cmd", o: { renameCollection: "graphs.tmp.agg_out.989", to: "graphs.graphs_temp", stayTemp: false, dropTarget: true } }) 2017-09-05T17:56:04.348+0000 E REPL [replication-168] Failed to apply batch due to 'OplogOperationUnsupported: error applying batch: Applying renameCollection not supported in initial sync: { ts: Timestamp 1504588941000|592, h: 4948566672906734558, v: 2, op: "c", ns: "graphs.$cmd", o: { renameCollection: "graphs.tmp.agg_out.989", to: "graphs.graphs_temp", stayTemp: false, dropTarget: true } }'
RATIONALE
Prior to SERVER-26117, allowing the renameCollection operation to complete could cause data divergence. Here's an example of a situation that may occur using aggregation:
- User performs an aggregation against the test.foo using $out to a test.aggResults collection.
- The aggregation starts generating results, and writes documents A and B to the temporary test.tmp.agg_out.1 collection.
- User adds a node to the replica set, and the node begins the initial sync process.
- Initial syncing node records minvalid, then lists all databases and collections that it needs to clone, discovers test.tmp.agg_out.1.
- The user’s aggregation continues, writing documents C and D to test.tmp.agg_out.1.
- The user’s aggregation completes, renaming test.tmp.agg_out.1 to test.aggResults.
- The initial sync clones all the collections it knows about in the test database. This includes an attempt to clone test.tmp.agg_out.1, however it discovers no documents on the sync source for that collection, as it has already been renamed to test.aggResults on the sync source.
- Note the initial syncing node does not attempt to clone test.aggResults because that collection didn’t exist when it listed the collections it needed to clone in step 4.
- The initial sync finishes data cloning and moves on to oplog application. It replicates the inserts of documents C and D to test.tmp.agg_out.1 (which implicitly creates that collection).
- The initial sync encounters the renameCollection oplog entry and proceeds, renaming its copy of test.tmp.agg_out.1 to test.aggResults
- Initial sync finishes
At this point the test.aggResults collection on the primary/sync source contains the documents A, B, C and D. On the newly added node however, that collection only contains the documents C and D, and while it believes itself consistent with the primary and caught up, reads from that node will return incomplete results. Additionally, if the user now does any writes to documents A or B this may cause the newly added node to crash as it won’t have any record of A or B.
REMEDIATION AND WORKAROUNDS
Users of renameCollection and aggregation with $out affected by this behavior need to pause the use of these features in order to completean initial sync operation.
Users mapReduce() may also pause their mapReduce() operations. Alternatively, they may use Output to a collection with an action as a workaround, as this avoids the renameCollection operation performed internally by the out option of mapReduce. For example:
db.outputcollection.drop() // The output collection can't be empty, so insert a marker document db.outputcollection.insert({marker:1}) db.mycollection.mapReduce(myMapFunction, myReduceFunction, { out: { merge : "outputcollection" } })
FIX VERSION
A fix for this behavior is included in MongoDB 3.6.
- is depended on by
-
SERVER-31093 Remove initialsync attempts from passthrough suites
- Closed
- is duplicated by
-
SERVER-5760 Collections remaining with different hashes at end of small oplog tests
- Closed
-
SERVER-18310 Can't rollback dropCollection if new primary renamed the collection
- Closed
-
SERVER-30952 Initial (re)sync never completes, stuck in a loop
- Closed
-
SERVER-35105 Applying renameCollection not supported in initial sync
- Closed
-
SERVER-38524 Rename collection in initial sync
- Closed
- is related to
-
SERVER-31944 initial_sync_applier_error.js is now obsolete since initial sync supports renameCollection ops
- Backlog
-
SERVER-15393 Renaming a collection with newly added background indexes may fail to replicate
- Closed
-
SERVER-30478 Add replset test for rename during init sync
- Closed
-
SERVER-30620 SyncTail::fetchAndInsertMissingDocument should use UUID
- Closed
-
SERVER-29772 Provide option to 3.2 and 3.4 to allow initial sync to complete even when it encounters renameCollection entries
- Closed
-
SERVER-15359 Provide "id" of collection
- Closed
- related to
-
SERVER-4332 renameCollection across dbs doesn't replicate correctly
- Closed
-
SERVER-5694 renameCollection replication is not always idempotent
- Closed
-
SERVER-40151 there is an error when adding a new secondary to the repliaset
- Closed
-
SERVER-26117 renameCollection 'c' op should restart initial sync upon application
- Closed