-
Type: Task
-
Resolution: Done
-
Priority: Critical - P2
-
Affects Version/s: 2.4.6, 2.4.8
-
Component/s: Replication
-
None
ISSUE SUMMARY
This issue only occurs if a replica set member enters a ROLLBACK state, and the operations being rolled back include a call to the collMod command which modifies usePowerOf2Sizes. If these conditions are encountered it will cause the member to shutdown and enter a FATAL state in the replica set.
USER IMPACT
If a replica set member encounters this command in the oplog section it is rolling back the server will shut down and the following messages will appear in the server log:
Tue Sep 24 05:47:09.376 [rsBackgroundSync] replSet error can't rollback this command yet: { collMod: "files", usePowerOf2Sizes: true } Tue Sep 24 05:47:09.383 [rsBackgroundSync] replSet cmdname=collMod Tue Sep 24 05:47:09.384 [rsBackgroundSync] replSet replica set fatal exception Tue Sep 24 05:47:09.384 [rsBackgroundSync] replSet error fatal, stopping replication
It will not be possible to restart the member successfully until this situation is cleared, the member will be left in the FATAL state.
This issue is present in all versions of MongoDB prior to and including v2.4.8.
SOLUTION
Instead of shutting down, the call is ignored and a warning is logged: "replSet not rolling back change of usePowerOf2Sizes"
WORKAROUNDS
The best workaround is to re-sync the replica set member. See documentation on re-syncing a member.
PATCHES
Production release v2.4.9 contains the fix for this issue, and production release v2.6.0 will contain the fix as well.
Original Description
If a replica set member attempts a rollback of a period which contained
{ collMod: "files", usePowerOf2Sizes: true }
this causes a fatal error. The replica set member is thereafter left in the FATAL state.
While it seems reasonable that usePowerOf2Sizes cannot be rolled back, this is probably not the best user experience. I would prefer that my replica set member continue to function, even if the disk space allocation algorithm is different than what I asked for.
Instead, this op could be skipped (with a loud warning)?
Full log snippet demonstrating the problem:
Tue Sep 24 05:47:06.335 [rsBackgroundSync] replSet syncing to: brs7.ny1.10gen.cc:27010 Tue Sep 24 05:47:09.374 [rsBackgroundSync] replSet we are ahead of the sync source, will try to roll back Tue Sep 24 05:47:09.375 [rsBackgroundSync] replSet rollback 0 Tue Sep 24 05:47:09.375 [rsBackgroundSync] replSet ROLLBACK Tue Sep 24 05:47:09.375 [rsBackgroundSync] replSet rollback 1 Tue Sep 24 05:47:09.375 [rsBackgroundSync] replSet rollback 2 FindCommonPoint Tue Sep 24 05:47:09.376 [rsBackgroundSync] replSet info rollback our last optime: Sep 24 05:47:05:39 Tue Sep 24 05:47:09.376 [rsBackgroundSync] replSet info rollback their last optime: Sep 24 05:46:48:ab Tue Sep 24 05:47:09.376 [rsBackgroundSync] replSet info rollback diff in end of log times: 17 seconds Tue Sep 24 05:47:09.376 [rsBackgroundSync] replSet WARNING ignoring op on rollback no _id TODO : backupstore.system.indexes { ts: Timestamp 1380001625000|3, h: 2707240384590046610, v: 2, op: "i", ns: "backupstore.system.indexes", o: { name: "_id_1_filename_1", ns: "backupstore.files", key: { _id: 1, filename: 1 } } } Tue Sep 24 05:47:09.376 [rsBackgroundSync] replSet error can't rollback this command yet: { collMod: "files", usePowerOf2Sizes: true } Tue Sep 24 05:47:09.383 [rsBackgroundSync] replSet cmdname=collMod Tue Sep 24 05:47:09.384 [rsBackgroundSync] replSet replica set fatal exception Tue Sep 24 05:47:09.384 [rsBackgroundSync] replSet error fatal, stopping replication Tue Sep 24 05:47:09.755 [conn476406] end connection 10.10.0.135:59317 (6 connections now open)
- related to
-
SERVER-19719 Failure to rollback noPadding should not cause fatal error
- Closed