-
Type: Bug
-
Resolution: Fixed
-
Priority: Major - P3
-
Affects Version/s: 3.4.2
-
Component/s: Replication
-
Fully Compatible
-
ALL
-
v3.6, v3.4
-
11
Hi Team,
We have a 10 shards (Primary / Secondary / Arbiter) sharded cluster which hosts 70k databases.
Here's the repartition on the shards:
(NB: some of our databases are not sharded)
mongos> db.databases.aggregate({$group:{_id: '$primary', count: {$sum:1}}}) { "_id" : "clust-users-2-shard10", "count" : 4594 } { "_id" : "clust-users-2-shard9", "count" : 8945 } { "_id" : "clust-users-2-shard8", "count" : 8624 } { "_id" : "clust-users-2-shard1", "count" : 8084 } { "_id" : "clust-users-2-shard7", "count" : 4505 } { "_id" : "clust-users-2-shard2", "count" : 4769 } { "_id" : "clust-users-2-shard6", "count" : 9370 } { "_id" : "clust-users-2-shard4", "count" : 4717 } { "_id" : "clust-users-2-shard3", "count" : 10217 } { "_id" : "clust-users-2-shard5", "count" : 5953 }
We're currently experiencing issues to resync this shard from scratch with the following error:
2017-11-16T05:49:33.245+0100 I - [replication-115] Assertion: 10334:BSONObj size: 32985739 (0x1F7528B) is invalid. Size must be between 0 and 16793600(16MB) First element: databasesCloned: 10191 src/mongo/bson/bsonobj.cpp 58
On another cluster with the same architecture but less databases per shards, we do not encounter this issue.
We plan to upgrade from version 3.4.4 to 3.4.10 but we haven't found anything related to this issue in changelog.
Is this a known issue or do you have more information about this?
Thanks.
Regards,
Benoit
- is related to
-
SERVER-84324 replSetGetStatus could asserts silently if initialSyncStatus is too large.
- Open
-
SERVER-25125 Add initial sync progress information to replSetGetStatus
- Closed
-
SERVER-27052 Add asynchronous operation support to DataReplicator
- Closed