-
Type: Bug
-
Resolution: Duplicate
-
Priority: Major - P3
-
None
-
Affects Version/s: 3.4.7
-
Component/s: Replication
-
None
-
ALL
-
- Run a 3-node cluster
- Run map/reduce jobs that require a temp collection
- Re-sync one of the nodes, the re-sync will fail and start again
We have a 3-node replicaset running v3.4.7, the primary is running 6 map/reduce jobs every minute or so and due to circumstances we had to re-sync one of the secondary nodes. However at one of the last steps of the re-sync we get the following error:
2017-09-05T17:56:04.348+0000 E REPL [repl writer worker 5] Error applying operation: OplogOperationUnsupported: Applying renameCollection not supported in initial sync: { ts: Timestamp 1504588941000|592, h: 4948566672906734558, v: 2, op: "c", ns: "graphs.$cmd", o: { renameCollection: "graphs.tmp.agg_out.989", to: "graphs.graphs_temp", stayTemp: false, dropTarget: true } } ({ ts: Timestamp 1504588941000|592, h: 4948566672906734558, v: 2, op: "c", ns: "graphs.$cmd", o: { renameCollection: "graphs.tmp.agg_out.989", to: "graphs.graphs_temp", stayTemp: false, dropTarget: true } }) 2017-09-05T17:56:04.348+0000 E REPL [replication-168] Failed to apply batch due to 'OplogOperationUnsupported: error applying batch: Applying renameCollection not supported in initial sync: { ts: Timestamp 1504588941000|592, h: 4948566672906734558, v: 2, op: "c", ns: "graphs.$cmd", o: { renameCollection: "graphs.tmp.agg_out.989", to: "graphs.graphs_temp", stayTemp: false, dropTarget: true } }' 2017-09-05T17:56:04.348+0000 I ASIO [NetworkInterfaceASIO-RS-0] Ending connection to host graphs1-mongo3:27017 due to bad connection status; 2 connections to that host remain open 2017-09-05T17:56:04.348+0000 I REPL [replication-167] Finished fetching oplog during initial sync: CallbackCanceled: Callback canceled. Last fetched optime and hash: { ts: Timestamp 1504634159000|4672, t: -1 }[-4041766555669726456] 2017-09-05T17:56:04.348+0000 I REPL [replication-168] Initial sync attempt finishing up.
After this, MongoDB cleans up the files and starts the re-sync again, it's now basically stuck in a very big loop. This used to work fine with 2.x, when we re-synced quite a few times.
I'm not sure what to do about this, we can't stop the map/reduce jobs for the duration of the resync as it takes about 8 hours to get to this point.
- duplicates
-
SERVER-4941 collection rename may not replicate / clone properly during initial sync
- Closed