Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Works as Designed
Priority: Major - P3
Fix Version/s: None
Affects Version/s: None
Component/s: None
Labels:
None

Confidence Status:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name:
None
Goal Link:
None

'code': 225, 'codeName': 'TransactionTooOld', 'errmsg': 'Cannot start transaction 3 on session ab828da0-dead-4cbd-beef-ac612334a5c1 - ugh4jez5/+Zo0w7yt4WMrZ1cJoa3zmk86txJfzwiQ18= because a newer transaction 4 has already started.

I'm running a sharded cluster, where each replica set is a PSA (primary, secondary, arbiter).

I'm processing a very large csv file where the data looks like this:

_id	tag
1	100
1	101
2	100
3	100
3	101

I need to group the tags by _id, so i use bulk operations in pymongo like:

UpdateOne({"_id": row["_id"]}, {"$addToSet": {"tag": row["tag"]}}, upsert=True)

which i run in batches of 5000.

If i run only one thread, i get no errors. If i split the csv file into 8 and run 8 parallel processes, i start getting the error above after a while, but it runs successfully for a few minutes. I'm suspecting that i hit a region of the csv file where i have the same _id over and over again. This looks similar to #14322, which i also had problems with, a few years ago, in an identical scenario.

What does that error even mean? What workaround can i try?

is caused by

PYTHON-1660 Driver session pools must be cleared after forking

Closed

related to

PYTHON-1745 Raise an error if an opened MongoClient is used after a fork

Closed

Assignee:: Shane Harvey
Reporter:: Tudor Aursulesei
Votes:: 0 Vote for this issue
Watchers:: 7 Start watching this issue

Created:: Sep 17 2019 06:24:57 AM UTC
Updated:: Oct 27 2023 01:10:32 PM UTC
Resolved:: Sep 27 2019 09:19:34 PM UTC

Details

Description

Attachments

Issue Links

Activity

People

Dates