-
Type:
Spec Change
-
Resolution: Unresolved
-
Priority:
Unknown
-
None
-
Component/s: Change Streams
-
None
-
Needed - No Spec Changes
Summary
What is the problem or use case, what are we trying to achieve?
The ChangeStream spec has a list of errors that should be considered resumable which includes:
- Any error encountered which is not a server error (e.g. a timeout error or network error)
Notably, this would also include Server Selection errors. Node.js and Python have identified that server selection errors would not be resumed if reached and would end the change stream. We should add tests to cover this and potentially identify other drivers with the current limitation.
We should also strive to consider more errors that are possibly included in this bullet and errors that should not be and seek to clarify it. ex. WaitQueueTimeoutMS errors or NullPointerExceptions, BSON parse error
Motivation
Who is the affected end user?
ChangeStream users
How does this affect the end user?
They have more opportunity to encounter errors during change stream use. Ideally, change streams resume anything a user could resume themselves with the current token.
How likely is it that this problem or use case will occur?
Any time there is a reconfiguration that leads to a server selection error.
If the problem does occur, what are the consequences and how severe are they?
The change stream becomes unusable a new one has to be created using the resume token.
Is this issue urgent?
No, the NODE-6858 issue that identified this is scheduled for FY26Q2
Is this ticket required by a downstream team?
No
Is this ticket only for tests?
Yes, the improvement to the wording in the spec isn't a fundamental change to the current intent
Acceptance Criteria
- Add a test that causes a server selection error to reach a change stream's resume logic
- Document a clearer set of errors drivers should consider resumable under the category of "not a server error"
- related to
-
NODE-6858 Change Stream stops working after failover
-
- Backlog
-