Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Fixed
Priority: Major - P3
Fix Version/s: 3.4.16, 3.6.6, 4.0.0-rc0
Affects Version/s: 3.4.2
Component/s: Replication
Labels:
- initialSync

Backwards Compatibility:
Fully Compatible
Operating System:
ALL
Backport Requested:

v3.6, v3.4
Linked BF Score:
11
Confidence Status:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name:
None
Goal Link:
None

Hi Team,

We have a 10 shards (Primary / Secondary / Arbiter) sharded cluster which hosts 70k databases.

Here's the repartition on the shards:
(NB: some of our databases are not sharded)

mongos> db.databases.aggregate({$group:{_id: '$primary', count: {$sum:1}}})
{ "_id" : "clust-users-2-shard10", "count" : 4594 }
{ "_id" : "clust-users-2-shard9", "count" : 8945 }
{ "_id" : "clust-users-2-shard8", "count" : 8624 }
{ "_id" : "clust-users-2-shard1", "count" : 8084 }
{ "_id" : "clust-users-2-shard7", "count" : 4505 }
{ "_id" : "clust-users-2-shard2", "count" : 4769 }
{ "_id" : "clust-users-2-shard6", "count" : 9370 }
{ "_id" : "clust-users-2-shard4", "count" : 4717 }
{ "_id" : "clust-users-2-shard3", "count" : 10217 }
{ "_id" : "clust-users-2-shard5", "count" : 5953 }

We're currently experiencing issues to resync this shard from scratch with the following error:

2017-11-16T05:49:33.245+0100 I -        [replication-115] Assertion: 10334:BSONObj size: 32985739 (0x1F7528B) is invalid. Size must be between 0 and 16793600(16MB) First element: databasesCloned: 10191 src/mongo/bson/bsonobj.cpp 58

On another cluster with the same architecture but less databases per shards, we do not encounter this issue.

We plan to upgrade from version 3.4.4 to 3.4.10 but we haven't found anything related to this issue in changelog.
Is this a known issue or do you have more information about this?

Thanks.

Regards,
Benoit

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

clust-users-2-shard3-2.log
15 kB
Nov 16 2017 02:38:55 PM UTC
mongod-logs.txt
9 kB
Apr 27 2018 04:10:48 AM UTC

is related to

SERVER-84324 replSetGetStatus could asserts silently if initialSyncStatus is too large.

Open

SERVER-25125 Add initial sync progress information to replSetGetStatus

Closed

SERVER-27052 Add asynchronous operation support to DataReplicator

Closed

Assignee:: Benety Goh

Reporter:: Benoit Bui

Participants:: Andy Schwerin, Anthony Brodard, Benety Goh, Benoit Bui, Githook User, Kelsey Schubert, Spencer Brody, Systems

Votes:: 0 Vote for this issue

Watchers:: 11 Start watching this issue

Created:: Nov 16 2017 09:49:38 AM UTC

Updated:: Dec 20 2023 02:04:02 AM UTC

Resolved:: May 02 2018 11:34:51 PM UTC

GA Target Date:: None

Public Preview Target Date:: None

Private Preview Target Date:: None

Experiment Target Date:: None

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates