-
Type: Bug
-
Resolution: Works as Designed
-
Priority: Major - P3
-
None
-
Affects Version/s: None
-
Component/s: None
-
None
'code': 225, 'codeName': 'TransactionTooOld', 'errmsg': 'Cannot start transaction 3 on session ab828da0-dead-4cbd-beef-ac612334a5c1 - ugh4jez5/+Zo0w7yt4WMrZ1cJoa3zmk86txJfzwiQ18= because a newer transaction 4 has already started.
I'm running a sharded cluster, where each replica set is a PSA (primary, secondary, arbiter).
I'm processing a very large csv file where the data looks like this:
_id | tag |
---|---|
1 | 100 |
1 | 101 |
2 | 100 |
3 | 100 |
3 | 101 |
I need to group the tags by _id, so i use bulk operations in pymongo like:
UpdateOne({"_id": row["_id"]}, {"$addToSet": {"tag": row["tag"]}}, upsert=True)
which i run in batches of 5000.
If i run only one thread, i get no errors. If i split the csv file into 8 and run 8 parallel processes, i start getting the error above after a while, but it runs successfully for a few minutes. I'm suspecting that i hit a region of the csv file where i have the same _id over and over again. This looks similar to #14322, which i also had problems with, a few years ago, in an identical scenario.
What does that error even mean? What workaround can i try?
- is caused by
-
PYTHON-1660 Driver session pools must be cleared after forking
- Closed
- related to
-
PYTHON-1745 Raise an error if an opened MongoClient is used after a fork
- Closed