-
Type: Improvement
-
Resolution: Fixed
-
Priority: Unknown
-
Affects Version/s: None
-
Component/s: None
-
None
If a change event is encountered in a change stream that exceeds the 16MB limit, the mongodb connector becomes stuck in a failure loop and is unable to recover without having offsets deleted. The new DLQ feature does not help, as the issue is happening inside mongodb (there's no "bad event" received to be sent to the DLQ).
Connector config:
"errors.tolerance": "all", "errors.log.enable": "true", "errors.log.include.messages": "true", "errors.deadletterqueue.topic.name": "dlq", "errors.deadletterqueue.topic.replication.factor": "1", "errors.deadletterqueue.context.headers.enable": "true"
example output when a "too large" event is encountered:
2021-04-27 03:35:42,633] INFO An exception occurred when trying to get the next item from the Change Stream: Query failed with error code 10334 and error message 'BSONObj size: 20793516 (0x13D48AC) is invalid. Size must be between 0 and 16793600(16MB) First element: _id: { _data: "826087866E000000682B022C0100296E5A100411C5B2DDA8794637A32244ED8485B866463C5F70726F664B65792E5F616363744964003C3466353736383061003C5F70726F664B65792E5F..." }' on server mongo:27097 (com.mongodb.kafka.connect.source.MongoSourceTask) [2021-04-27 03:35:44,107] INFO Watching for collection changes on 'the_collection' (com.mongodb.kafka.connect.source.MongoSourceTask) [2021-04-27 03:35:44,108] INFO Resuming the change stream after the previous offset: {"_data": "826087866E0000005D2B022C0100296E5A100411C5B2DDA8794637A32244ED8485B866463C5F70726F664B65792E5F616363744964003C3338323730313430003C5F70726F664B65792E5F73003C70726F66696F003C5F6964003C316634386C6472633875347265383030000004"} (com.mongodb.kafka.connect.source.MongoSourceTask) [2021-04-27 03:35:44,696] WARN Failed to resume change stream: BSONObj size: 20793516 (0x13D48AC) is invalid. Size must be between 0 and 16793600(16MB) First element: _id: { _data: "826087866E000000682B022C0100296E5A100411C5B2DDA8794637A32244ED8485B866463C5F70726F664B65792E5F616363744964003C3466353736383061003C5F70726F664B65792E5F..." } 10334===================================================================================== If the resume token is no longer available then there is the potential for data loss. Saved resume tokens are managed by Kafka and stored with the offset data.To restart the change stream with no resume token either: * Create a new partition name using the `offset.partition.name` configuration. * Set `errors.tolerance=all` and ignore the erroring resume token. * Manually remove the old offset from its configured storage.Resetting the offset will allow for the connector to be resume from the latest resume token. Using `copy.existing=true` ensures that all data will be outputted by the connector but it will duplicate existing data. ===================================================================================== (com.mongodb.kafka.connect.source.MongoSourceTask) [2021-04-27 03:35:49,093] INFO Watching for collection changes on 'the_collection' (com.mongodb.kafka.connect.source.MongoSourceTask) [2021-04-27 03:35:49,094] INFO Resuming the change stream after the previous offset: {"_data": "826087866E0000005D2B022C0100296E5A100411C5B2DDA8794637A32244ED8485B866463C5F70726F664B65792E5F616363744964003C3338323730313430003C5F70726F664B65792E5F73003C70726F66696F003C5F6964003C316634386C6472633875347265383030000004"} (com.mongodb.kafka.connect.source.MongoSourceTask) [2021-04-27 03:35:49,684] WARN Failed to resume change stream: BSONObj size: 20793516 (0x13D48AC) is invalid. Size must be betwe... (repeats until killed)