Since bgsync aborts the index build even before transitioning to rollback state, side effect of that is really bad, as the node is still eligible to run election and become primary. One notable consequence of that behavior is that, consider a case where we have 3 node replica set. (node A is the primary and node B secondary1 and node C is secondary2) and the thread pool size is 1.
1) node A (primary for term 10) starts the index Build 'x_1', uses indexbuildCoordinator thread pool and generates startIndexBuild oplog entries to both secondaries.
2) node B and node C, on receiving the startIndexBuild starts the index build (uses indexbuildCoordinator thread pool)
3) node A faces network partition and gets disconnected from node B and node C.
4) node A receives some writes W1 at term 10 and sees it lost majority of votes and steps down.
5) Node C gets elected and becomes primary for term 11. And, node A now rejoins the n/w and sees the sync source, say, node C (new primary) has diverged from its oplog. So, it gets into this code path and starts aborting the index build. Since the node A hasn't yet transitioned to rollback, it's free to run the election and let's assume it won the election on receiving vote from node B.
As a result of step 5, node A will no longer run the real rollback step. This is because, on node A becoming primary, it stops the oplog fetcher service, so this check or [this|https://github.com/mongodb/mongo/blob/17984db6c531594c00bf226804d9ab7ed6225643/src/mongo/db/repl/rollback_impl.cpp#L190 check might fails making the node not to rollback any oplog entries.
Problems:
1) The consequence of this is that index build on secondaries becomes orphaned.
2) Since the index build on node A got aborted, the node A is free to start new index build, say, 'y_1'. If secondaries receives the startIndexBuild oplog entry for index 'y_1', the secondaries would wait for the indexBuildsCoordinator thread to become available and blocks secondary replication.
Solution: We should abort index build only when the node transitioned its state to rollback and we are sure that the entries are going to get rolled back. And, it applies to both rollback via recoverToStableTimestamp and rollback via refetch.
P.S: I noticed this failure frequently in my patch build. And, currently, since the index build is generating high volumes of timeout error. The BF stating this issue is lost.
- is depended on by
-
SERVER-46823 Enable default for index commit quorum as "votingMembers"
- Closed
- related to
-
SERVER-46976 Enable commit quorum in rollback_waits_for_bgindex_completion.js
- Closed
-
SERVER-48419 Extend rollback to recover resumable index builds efficiently
- Closed