-
Type: Bug
-
Resolution: Won't Fix
-
Priority: Major - P3
-
None
-
Affects Version/s: None
-
Component/s: Replication
-
None
-
ALL
-
(copied to CRM)
This ticket came from investigation on a help ticket SERVER-50147 filed by customer.
Currently, when repl is enabled, MongoDB 3.4 sets 'oplogDeleteFromPoint' field in minvalid document to a non-null timestamp during steady state oplog application before writing oplog entries and clears the timestamp after writing the oplog entries. So, on unclean shutdown 3.4 can have 'oplogDeleteFromPoint' with non-null timestamp.
In that unclean shutdown case, if the user restarts the node as standalone before upgrading to mongoDB binary version 3.6, then we can hit the problem mentioned in SERVER-50147. (see here for related nexus of prior work)
Solution:
The current work-around solution is to manually unset the 'oplogDeleteFromPoint' field in minvalid document which is an unsafe solution. 'oplogDeleteFromPoint' with non-null timestamp indicates that there was a shutdown happened in the middle of writing an oplog batch and this info is necessary for startup recovery until MongoDB 3.6 binary FCV 3.4.
Discussed some solutions of unsetting the field in SERVER-50147, but it's really not safe to unset the field manually in any-version. Safer option would be to fail MongoDB FCV upgrade 3.6 if local.replset.minvalid document contains not null timestamp in 'oplogDeleteFromPoint' field (makes user to do startup-recovery in 3.4 binary or 3.6 binary with FCV 3.4). Also, need to unset the 'oplogDeleteFromPoint' field with null timestamp on FCV 3.6 upgrade irrespective of whether the node is standalone or repl-enabled.
- duplicates
-
SERVER-50147 Cannot start mongo after upgrading to 4.2
- Closed