Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Works as Designed
Priority: Major - P3
Fix Version/s: None
Affects Version/s: None
Component/s: Replication
Labels:
None
Environment:
Mongo 4.4.3 Community Edition
Running on Redhat Linux

Operating System:
ALL
Sprint:
Repl 2021-07-12
Confidence Status:
None
Work Order:
3

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name:
None
Goal Link:
None

I have sharded cluster for a rather big application which inserts around 15'000 document every second.

db.getCollection('sessions').insertOne({ t: ISODate(), some more fields ...}) // 10'000-20'000 inserts per second!

Once per hour I run a bucketing operation. In principle it looks like this:

db.getCollection('sessions').renameCollection('sessions.temp');
db.getCollection('sessions').createIndexes([{ t: 1 }], {}, 1);
db.getCollection('sessions.temp').aggregate([
   { $group: ... }
   { $out: "sessions.temp.stats" }
]);
db.getCollection('sessions.temp.stats').aggregate([
   // ...
   { $merge: { into: { db: "data", coll: "session.statistics" } } }
]);
db.getCollection('sessions.temp').aggregate([
   { $group: ... }
   { $merge: { into: { db: "data", coll: "sessions.20210613" } } }
]);
db.getCollection('sessions.temp.stats').drop({ writeConcern: { w: 0, wtimeout: 60000 } })
db.getCollection('sessions.temp').drop({ writeConcern: { w: 0, wtimeout: 60000 } })

When I drop replica set member and restart then an inital sync starts as expected. The inital sync takes around 8 hours, i.e. while inital sync is running above bucketing job runs (without any problems).

However, when all databases are cloned (rs.status() states "databases: {databasesToClone: 0, databasesCloned: 9 ...") then I get thousands errors of

Error applying inserts in bulk. Trying first insert as a lone insert","attr":{"groupedInserts": ...

See attached log file for more details.

I get many thousands of these errors. The disk runs out of space and MongoDB stops working!

If I disable the hourly bucketing job then the inital sync runs without any problem. So, I assume the issue is caused by dropping/renaming, re-use of collection names, etc.

related to

SERVER-58164 When group insert fails, the error type is not printed in logs.

Closed

Assignee:: Wenbin Zhu
Reporter:: Wernfried Domscheit
Participants:: Dmitry Agranat, Eric Sedor, Wenbin Zhu, Wernfried Domscheit
Votes:: 0 Vote for this issue
Watchers:: 4 Start watching this issue

Created:: Jun 13 2021 05:46:07 PM UTC
Updated:: Oct 27 2023 01:52:22 PM UTC
Resolved:: Jun 30 2021 10:49:08 PM UTC

Details

Description

Attachments

Issue Links

Activity

People

Dates