Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-50749

Re-loading is slow with py-tpcc

    • Type: Icon: Bug Bug
    • Resolution: Duplicate
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: 4.4.0
    • Component/s: None
    • Storage Engines
    • ALL
    • v5.1
    • Storage - Ra 2021-09-20, Storage - Ra 2021-10-04

      While loading data for py-tpcc flow control is engaged, the insert rate drops and a few inserts take 200 to 300 seconds.

      This is from Percona. They previously gave us the repro for WT-6444 via py-tpcc. In their report the first load into database tpcc1 takes ~20 minutes with a new mongod instance. After sleeping a few minutes and then repeating the load into database tpcc3 the second load takes ~500 minutes. They used a single-node replica set and my repro attempts do the same.

      Part of this is a duplicate of SERVER-46114 which was closed as works as designed. If you read all of the updates below, there is a chance that mongod gets stuck with flow control engaged, an insert statement that never finishes and mongod unable to shutdown. So I don't think works as designed is appropriate.

      Summarizing what I see below in my repro attempts:

      • this problem is new in 4.4.0. I tried but could not reproduce this with 4.2.9.
      • many inserts take more than 5 seconds with 4.4.0 (up to 390 seconds ignoring the hang). No inserts take more than 5 seconds with 4.2.9
      • in one test mongod got stuck. An insert statement was saturating a CPU core but making no progress for 1+ hour. It did not stop after killOp(). Shutting down mongod via "killall mongod" did not stop mongod and eventually I did kill -9.
      • with flow control enabled and 4.4.0 there are stalls (inserts that take 10 to 390 seconds)
      • with flow control disabled and 4.4.0 there are still stalls, but they are not as bad (10 to 60 seconds) as above

      I have ftdc and mongod error logs for most of the results listed below. I can provide them if requested. There are many, so I prefer to do that on demand.

        1. example.png
          example.png
          320 kB
        2. flow0.mo429.tar.gz
          4.32 MB
        3. flow0.mo440.tar.gz
          8.01 MB
        4. flow1.mo429.tar.gz
          6.16 MB
        5. flow1.mo440.tar.gz
          8.12 MB
        6. ftdc.tpcc.440.hang.tar
          1.32 MB
        7. metrics.2020-09-09T20-02-23Z-00000
          1.56 MB
        8. mongod.log.440.hang.gz
          19 kB
        9. Screen Shot 2020-09-09 at 6.16.19 PM.png
          Screen Shot 2020-09-09 at 6.16.19 PM.png
          157 kB
        10. Screen Shot 2020-09-10 at 9.35.27 AM.png
          Screen Shot 2020-09-10 at 9.35.27 AM.png
          83 kB

            Assignee:
            backlog-server-storage-engines [DO NOT USE] Backlog - Storage Engines Team
            Reporter:
            mark.callaghan@mongodb.com Mark Callaghan (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            17 Start watching this issue

              Created:
              Updated:
              Resolved: