Uploaded image for project: 'Python Driver'
  1. Python Driver
  2. PYTHON-1992

Running bulk insert raises TransactionTooOld, Cannot start transaction X on session Y because a newer transaction Z has already started.

    • Type: Icon: Bug Bug
    • Resolution: Works as Designed
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • None

      'code': 225, 'codeName': 'TransactionTooOld', 'errmsg': 'Cannot start transaction 3 on session ab828da0-dead-4cbd-beef-ac612334a5c1 - ugh4jez5/+Zo0w7yt4WMrZ1cJoa3zmk86txJfzwiQ18= because a newer transaction 4 has already started.

      I'm running a sharded cluster, where each replica set is a PSA (primary, secondary, arbiter).

      I'm processing a very large csv file where the data looks like this:

      _id tag
      1 100
      1 101
      2 100
      3 100
      3 101

      I need to group the tags by _id, so i use bulk operations in pymongo like:

      UpdateOne({"_id": row["_id"]}, {"$addToSet": {"tag": row["tag"]}}, upsert=True)
      

      which i run in batches of 5000.

      If i run only one thread, i get no errors. If i split the csv file into 8 and run 8 parallel processes, i start getting the error above after a while, but it runs successfully for a few minutes. I'm suspecting that i hit a region of the csv file where i have the same _id over and over again. This looks similar to #14322, which i also had problems with, a few years ago, in an identical scenario.

      What does that error even mean? What workaround can i try?

            Assignee:
            shane.harvey@mongodb.com Shane Harvey
            Reporter:
            thestick613 Tudor Aursulesei
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

              Created:
              Updated:
              Resolved: