-
Type: Bug
-
Resolution: Incomplete
-
Priority: Major - P3
-
None
-
Affects Version/s: 3.6.6
-
Component/s: Stability
-
None
-
ALL
-
-
Sharding 2019-09-09, Sharding 2019-09-23, Sharding 2019-10-07, Sharding 2019-12-02, Sharding 2019-12-16, Sharding 2019-12-30, Sharding 2020-02-10, Sharding 2020-02-24
-
(copied to CRM)
I'm not certain if this would happen every time, but it did happen to us in production.
We had an object that was very close to 16MB (15.99MB according to bsonsize()), and our application went to update the record with a little more data.
The mongos that was being used then crashed with the following message:
2019-08-11T08:10:25.814+0000 F ASIO [NetworkInterfaceASIO-TaskExecutorPool-2-0] Uncaught exception in NetworkInterfaceASIO IO worker thread of type: Location10334: BSONObj size: 16794106 (0x10041FA) is invalid. Size must be between 0 and 16793600(16MB) First element: update: "<COLLECTION_NAME>"
FYI In the above and the full crash logs, the collection name is redacted to "<COLLECTION_NAME>".
Then our application, which tries to re-write this data periodically if the initial write fails, tried to write it a little later, and went to a different mongos server, which also crashed. This caused our cluster to be effectively unavailable since both mongos nodes had crashed.
I've attached both stack traces.
Obviously we don't want to be running with DB objects at or close to 16MB, so we fixed the object in question to not be as big, but even though this isn't something we have happening all the time, it does happen occasionally and we expect to need to run our production servers with the ability for 16MB objects to gracefully fail to save in the future.
Our version is technically 3.6.6-evg1, which is a custom build we have branched directly off of 3.6.6, which you can find here https://github.com/evergage/mongo/commits/v3.6.6-evg1. The only difference is the last 3 commits you see there which just quiets some extra verbose metadata logging that was eating basically infinite log entries and we had to silence in order to run this in production. Since the changes are so minor, hopefully that means that the stack trace line numbers and such are still usable for you. Since then that bug (https://jira.mongodb.org/browse/SERVER-30841?filter=21888) has been fixed in 3.6.8, and assuming that it silenced all the things we silenced in our custom build (3 different files), then we might be able to get off of running a custom build in the future.
- is duplicated by
-
SERVER-44345 MongoS crash with "BufBuilder attempted to grow()" above 64MB while restarting/upgrading a secondary from 3.4 to 3.6
- Closed
- related to
-
SERVER-29109 Client metadata log message verbosity is not parallel with connection start/end messages
- Closed
-
SERVER-27663 Informational Network component log messages should be configurable
- Open
-
SERVER-30841 Lower the amount of metadata refresh logging
- Closed